decomposition and denoising for moment sequences using convex optimization
TRANSCRIPT
Decomposition and Denoising for moment sequences using convex optimization
Badri Narayan Bhaskar
High Dimensional Inference
n p
Statistics
Microarray Data
HyperspectralImaging
Predict Ratings
Sparse Vector Estimation Gene Expressions
X
n
p
k sparse
Samples
Genes
Few Active Genes
Disease Susceptibility =
Response
Low Rank Matrices The Netflix Problem
Movie Ratings
thousands ofmovies
millions ofsubscribers
movie profile
user profile
Action
+ a handful of other features
Decomposition
Low Rank Matrices Topic Models
Term DocumentMatrix
thousands ofdocuments
millions ofwords
degree of topicality
topic profile
Politics
+ a handful of other features
Decomposition
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
High Dimensional Inference
n p
Statistics
Microarray Data
HyperspectralImaging
Predict Ratings
Sparse Vector Estimation Gene Expressions
X
n
p
k sparse
Samples
Genes
Few Active Genes
Disease Susceptibility =
Response
Low Rank Matrices The Netflix Problem
Movie Ratings
thousands ofmovies
millions ofsubscribers
movie profile
user profile
Action
+ a handful of other features
Decomposition
Low Rank Matrices Topic Models
Term DocumentMatrix
thousands ofdocuments
millions ofwords
degree of topicality
topic profile
Politics
+ a handful of other features
Decomposition
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Sparse Vector Estimation Gene Expressions
X
n
p
k sparse
Samples
Genes
Few Active Genes
Disease Susceptibility =
Response
Low Rank Matrices The Netflix Problem
Movie Ratings
thousands ofmovies
millions ofsubscribers
movie profile
user profile
Action
+ a handful of other features
Decomposition
Low Rank Matrices Topic Models
Term DocumentMatrix
thousands ofdocuments
millions ofwords
degree of topicality
topic profile
Politics
+ a handful of other features
Decomposition
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Low Rank Matrices The Netflix Problem
Movie Ratings
thousands ofmovies
millions ofsubscribers
movie profile
user profile
Action
+ a handful of other features
Decomposition
Low Rank Matrices Topic Models
Term DocumentMatrix
thousands ofdocuments
millions ofwords
degree of topicality
topic profile
Politics
+ a handful of other features
Decomposition
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Low Rank Matrices Topic Models
Term DocumentMatrix
thousands ofdocuments
millions ofwords
degree of topicality
topic profile
Politics
+ a handful of other features
Decomposition
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Simple objects
Objects Notion of Simplicity Atoms
Vectors Sparsity Canonical unit vectors
Matrices Rank Rank-1 matrices
Bandlimited signals No of frequencies Complex sinusoids
Linear systems Number of poles Single pole systems
x =X
a2Acaax =
X
a2Acaa
atomic set
atomsldquofeaturesrdquo
nonnegative weights
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Universal RegularizerChandrasekaran Recht Parillo Wilsky (2010)
Objects Notion of Simplicity Atomic Norm
Vectors Sparsity
Matrices Rank nuclear norm
Bandlimited signals No of frequencies
Linear systems McMillan degree
1 QRUP
xA = LQI t 0 x t FRQY(A)
= LQI
a
ca x =
aAcaa ca gt 0
t
xxA = LQI t 0 x t FRQY(A)
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Moment Problems
A
atoms
+ +
x
measure space
moments of a discrete measure
atomic moments
System Identification
minus1
0
1
minus1
0
10
2
4
6
minus05 0 052
3
4
5
6
7
Line Spectral Estimation
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Decomposition
How many atoms do you need to represent x
composing atoms are on an exposed face decomposition is easy to find
find a supporting hyperplane to certify
concentrate on recovering ldquogoodrdquo objects with atoms on exposed faces
t
x
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Dual Certificate
1 2 3 4 5 6 7
0
1
2
3
4
5
6
X Axis
Y A
xis
Gradient
Subgradients
optimum value is atomic norm
solutions are subgradients at x
semi-infinite problem
maximize
qhq xi
subject to kqkA 1
Dual Characterization of atomic norm
kqkA = supa2A
hq ai
(under mild conditions Bonsall)
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Denoising
minimizex
1
2ky xk22 + kxkA
Tradeoff parameter
AST (Atomic Soft Thresholding)
Dual AST
maximize
qhq yi kqk22
subject to kqkA 1
y = x + wLQ Cn
QRLVH N (0 2In)
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
AST Results
q + x = y
kqkA 1
hq xi = kxkA
Optimality Conditions
suggests q w ) EkwkA
means
q 2 kxkACan find composing atoms
Tradeoff = Gaussian Width
No cross validation needed - estimate Gaussian width
Gaussian width also important to characterize convergence rates
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
AST Convergence Rates
Fast Rate
s sparse
1
nX X2
2 C2 s log(p)
n
Slow RateLasso
n
1
nX X2
2
log(p)
n1
Choose appropriate tradeoff
= E (wA) 1
Universal but slow rateMinimal assumptions on atoms
1
nx x2
2
nxA
With a ldquocone conditionrdquo faster rates are possible
Weaker than RIP Coherence RSC RE 1
nx x2
2 C2
12n
ĪIRUODUJHHQRXJK ī
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Summary
Notion of simple models
Atomic norm unifies treatment of many models
Recovers tight published bounds
Bounds depend on Gaussian width
Faster rates with a local cone condition
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Line Spectral Estimation
x0
xk
xn1
=
s
j=1
cjei2fjk
Line Spectrum Time Samples
Extrapolate the remaining moments
micro =
j
cjfj
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Super resolution Time frequency dual
Extrapolate high frequency information from low frequency samples
Just exchange time and frequency same as line spectral estimation
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Classical Line Spectrum Estimation (a crash course)
Tn(x) =
2
6664
x1 x2 xn
x
2 x1 xn1
x
n x
n1 x1
3
7775Define for any vector x
Nonlinear Parameter Estimation xk =
s
j=1
cje2ifjk
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Classical techniques
PRONY estimate s root findingCADZOW alternating projectionsMUSIC low rank plot pseudospectrum
Sensitive to model orderNo rigorous theoryOnly optimal asymptotically
Low rank and ToeplitzFACT
Tn(x) =s
j=1
cj
1ei2fj
ei2(n1)fj
1 ei2fj ei2(n1)fj
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Convex Perspective
0 2 4 6 8 10 12 14minus10
minus5
0
5
10
15
x =s
j=1
cja(fj)
amplitudes
frequencies
x0
xk
xn1
=
s
j=1
cjei2fjk
1
ei2fjk
ei2fj(n1)
AtomsLine Spectrum
A+ = a(f) | f [0 1]Positive Amplitudes
A =a(f)ei
f [0 1] [0 2]
Complex
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Convergence rates
With high probability for ngt4
pn log(n)Choose tradeoff as
Dudleyrsquos inequalityBernsteinrsquos polynomial theorem
For a signal with n samplesof the form
Using the global AST guarantee we get
E
1
nx x2
2
C
log(n)
n
s
j=1
|cj |
yk =s
j=1
cje2ifjk + wk
n log(n) n2 log(4 log(n)) w
A
1 +
1
log(n)
n log(n) + n log(1632 log(n))
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Fast Rates
E1
n
kx x
k22
C
2s
log(n)
n
minp 6=q
d(up uq) 4
n
If every two frequencies are far enough apart (minimum separation condition)
For s frequencies and n samples MSE is
Not a global condition on the dictionary local to the signal
Proof uses many properties of trigonometric polynomials and Fejer kernels derived by Candes and Fernandez Granda
= pn log(n) 2 (11) large enough
and the tradeoff parameter is chosen correctly
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Canrsquot do much better
Also nearly matches minimax bound on best prediction rate for signals with 4n minimum separation
E1
n
kx x
k22
C
2 s log(n4s)
n
Only logarithmic factor from ldquooracle raterdquo
E1
n
kx x
k22
C
2s
log(n)
n
Our rate
E
1
nx x2
2
C2 s
n
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Minimum Separation and Exact Recovery Result of Candes and Fernandez Granda
4n separation = can construct explicit dual certificate of exact recovery
Convex hull of far enough vertices are exposed faces
Prony does not have this limitation (Then why bother Robustness guarantee)
Minimum separation is necessary
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
except for the positive case
Convex hull of every collection of n2 vertices is an exposed face
Infinite dimensional version of cyclic polytope which is neighbourly
No separation condition needed
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Localizing Frequencies
0 02 04 06 08 10
02
04
06
08
1
Frequency
Dua
l Pol
ynom
ial V
alue
034 036 038 04 042 044 046085
09
095
1
Frequency
Dua
l Pol
ynom
ial V
alueDual Polynomial
y = x+ q
Q(f) =n1
j=0
qjei2jf
hq ai = 1 a 2 support
hq ai lt 1 otherwise
x
t
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Localization Guarantees
Spurious Amplitudes
Weighted Frequency Deviation
Near region approximation
10 01 02 03 04 05 06 07 08 09
1
0
Frequency
Du
al P
oly
no
mia
l M
agn
itu
de
True FrequencyEstimated Frequency
016nSpurious Frequency
lflF
|cl| C1
k2 ORJ(n)
n
cj
lflNj
cl
C3
k2 ORJ(n)
n
lflNj
|cl|PLQfjT
d(fj fl)
2
C2
k2 ORJ(n)
n
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
How to actually solve AST Semidefinite Characterizations
minimizex
1
2ky xk22 + kxkA
conv(A+) conv(A)
kxkA+ =
(x0 Tn(x) 0
+1 otherwise
Bounded Real LemmaCaratheodory Toeplitz
A+ = a(f) | f [0 1]Positive Amplitudes A =a(f)ei
f [0 1] [0 2]
Complex
xA = inf
1
2ntr(Tn(u)) +
1
2t
Tn(u) x
x t
0
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Alternating Direction Method of Multipliers
minimize
tuxZ
12kx yk22 +
2 (t+ u1)
subject to Z =
T (u) x
x
t
Z 0
AST SDP Formulation
(tl+1 u
l+1 x
l+1) arg mintux
L
(t u x Zl
l)
Z
l+1 argminZ0
L
(tl+1 u
l+1 x
l+1 Zl)
l+1 l +
Z
l+1 T (ul+1) x
l+1
x
l+1t
l+1
Alternating Direction Iterations
quadratic objective linear
projection on psd cone
linear dual update
L(t u x Z) =1
2kx yk22 +
2(t+ u1) +
Z
T (u) x
x
t
+
2
Z T (u) x
x
t
2
F
Form Augmented Lagrangian
Lagrangian Term Augmented Lagrangian
Momentum Parameter
| z | z
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Alternative Discretization Method
Can use a grid AN = a(f) f = jN j = 1 N 2 [0 2)
We show (1 2n
N
)kxkAN kxkA kxkAN
AST is approximated by Lasso
n
frequencies on a N grid
amplitudes
c
+ w
atomic moments
Via Bernsteinrsquos Polynomial Theorem or Fritz John
Works for general epsilon nets
Fast shrinkage algorithms
Fourier projections are cheap
Coherence is irrelevant
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Experimental setup
n=64 or 128 or 256 samples
no of frequencies n16 n8 n4
SNR -5 dB 0 dB 20 dB
(Root) MUSIC Matrix pencil Cadzow All fed true model order
Empirical estimated (not fed) noise variance for AST
Compared mean-squared-error and localization error metrics averaged over 20 trials
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Mean Squared Error MSE vs SNR
iuml10 iuml5 0 5 10 15 20iuml30
iuml20
iuml10
0
10
SNR(dB)
MSE
(dB)
ASTCadzowMUSICLasso
128 samples 8 frequencies
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Mean Squared Error Performance Profiles
2 4 6 8 10 12 140
02
04
06
08
1
β
Ps(β
)
ASTLasso 215
MUSICCadzowMatrix Pencil
P () = Fraction of experiments with MSE less than minimum MSE
Average MSE in 20 trialsn=64128256s=n16n8n4SNR=-5 0 20 dB
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Mean Squared Error Effect of Gridding
1 2 3 4 5 6 7 8 9 100
02
04
06
08
1
β
Ps(β
)
Lasso 210
Lasso 211
Lasso 212
Lasso 213
Lasso 214
Lasso 215
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Frequency Localization
20 40 60 80 1000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
100 200 300 400 5000
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
5 10 15 200
02
04
06
08
1
β
Ps(β
)
AST
MUSIC
Cadzow
minus10 minus5 0 5 10 15 200
20
40
60
80
100
k = 32
SNR (dB)
m1
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
001
002
003
004
k = 32
SNR (dB)
m2
ASTMUSICCadzow
minus10 minus5 0 5 10 15 200
1
2
3
4
5
k = 32
SNR (dB)m
3
ASTMUSICCadzow
Spurious Amplitudes Weighted Frequency Deviation Near region approximation
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Simple LTI SystemsVibration u(t)
Response y(t)
u(t) y(t)
x[t+ 1] = Ax[t] +Bu[t]
y[t] = Cx[t] +Du[t]State Space
Find a simple linear system from time series data
Nonlinear Kalman filter based estimation
Parametric methods based on Hankel Low rank formulations
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Subspace Methods
2
666664
y1 y2 middot middot middot yk
y2 y3 middot middot middot yk+1
y3 y4 middot middot middot yk+2
y` y`+1 middot middot middot y`+k1
3
777775=
2
666664
C
CA
CA
2
CA
`1
3
777775
x1 x2 xk
+
2
666664
D 0 0CB D 0 0CAB CB D 0
CA
`2B D
3
777775
2
666664
u1 u2 middot middot middot uk
u2 u3 middot middot middot uk+1
u3 u4 middot middot middot uk+2
u` u`+1 middot middot middot u`+k1
3
777775
With a little computation
| z | z
| z
Y
X
U
Y = OX + TU Y U = OXU
Output
State
Input
System Matrix T
Observability
SVD Subspace ID (n4sid)
Nuclear Norm (Vandenberghe et al)
Low Rank
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
SISO
2 1
u[t]G
0 1 2
Past inputs Hankel Operator Future Outputs
G 2( 0) 2[0)
Fact Rank of Hankel Operator = McMillan DegreeRelax and use Hankel nuclear normNot clear how to compute
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Convex Perspective SISO Systems
G(z) =2(1 072)
z 07+
5(1 052)
z (03 + 04i)+
5(1 052)
z (03 04i)
minus1
0
1
minus1
0
10
2
4
6
A =
1 |w|2
z w
w 2 D
D
y = L (G(z)) + w
minus05 0 052
3
4
5
6
7
minimizeG
ky L(G)k22 + microkGkA
Regularize with atomic norms
8kGkA kGk1 kGkA
Theorem
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Atomic Norm Regularization
How to solve this
Equivalent to
Discretize and Set
k` = Lk
1 |w|2
z w`
minimizec
12ky ck22 + microkck1 Lasso
minimizeG
ky L(G)k22 + microkGkA
minimize
cx
ky xk22 + micro
Pw2D
|cw|1|w|2
subject to x =
Pw2D
c
w
L
1zw
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
DAST Analysis
constant Hankel nuclear norm
stability radius
number of measurements
Measure uniform spaced samples of frequency response
kG Gk2H2 0kGk1
(1 )pn
G(z) =X
`
c`1 |w`|2
z w`Set
Then we have
k` =1 |w`|2
e2ikn w`
Solvex = argmin
x
12ky xk22 + microkxkA
over a net on the disk
where the coefficients correspond to the support of
x
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
6 polesρ = 095σ = 1e-1
n=30
H2 err = 2e-2
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Summary
Optimal bounds for denoising using convex methods
Better empirical performance than classical approaches
Local conditions on signals instead of global dictionary conditions
Discretization works
Acknowledgements Results derived in collaboration with Tang Shah and Prof Recht
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Future Work
bull Better convergence results for discretization
bull Why does weighted mean heuristic work so well for discretization
bull Better algorithms for System Identification
bull Greedy techniques
bull More atomic sets General localization guarantees
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Referenced Publications
B Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo in 49th Allerton Conference on Controls and Systems 2011
B Tang Recht ldquoAtomic Norm Denoising for Line Spectral Estimationrdquo submitted to IEEE Transactions on Signal Processing
Tang B Recht ldquoNear Minimax Line Spectral Estimationrdquo To be submitted
Shah B Tang Recht ldquoLinear System Identification using Atomic Norm Minimizationrdquo Control and Decision Conference 2012
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Other Publications
Tang B Shah Recht ldquoCompressed Sensing off the gridrdquo submitted to IEEE Transactions on Information Theory
Dasarathy Shah B Nowak ldquoCovariance Sketchingrdquo 50th Allerton Conference on Controls and Systems 2012
Gubner B Hao ldquoMultipath-Cluster Channel Modelsrdquo IEEE Conference on Ultrawideband Systems 2012
Wittenberg Alimadhi B Lau ldquoei RxC Hierarchical Multinomial-Dirichlet Ecological Inference Modelrdquo in Zelig Everyonersquos Statistical Software
Tang B Recht ldquoSparse recovery over continuous dictionaries Just discretizerdquo Submitted to Asilomar Conference 2013
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Cone Condition
= LQIz =0
z2
zA
z C(x)
C(x) = FRQH z | x + zA xA + zA
Inflated Descent Cone
No direction in Cᵧ with zero atomic normHVFHQWampRQH C0
(0 1)
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal
Minimum separation is necessary
Using two sinusoids
ka(f1) a(f2)kA n12ka(f1) a(f2)k2 2n2|f1 f2|
So
When separation is lower than 14n2 atomic norm canrsquot recover it
ka(f1) a(f2)k2 = 2n32|f1 f2|
Can empirically show failure with less than 1n separation and adversarial signal