sparse recovery using sparse (random) matrices piotr indyk mit joint work with: radu berinde, anna...
TRANSCRIPT
![Page 1: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/1.jpg)
Sparse Recovery Using Sparse (Random) Matrices
Piotr Indyk
MIT
Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic
![Page 2: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/2.jpg)
Goal of this talk
“Stay awake until the end”
![Page 3: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/3.jpg)
Linear Compression(learning Fourier coeffs, linear sketching, finite rate of innovation,
compressed sensing...)
• Setup:– Data/signal in n-dimensional space : x E.g., x is an 1000x1000 image n=1000,000– Goal: compress x into a “sketch” Ax , where A is a m x n “sketch matrix”, m << n
• Requirements:– Plan A: want to recover x from Ax
• Impossible: undetermined system of equations
– Plan B: want to recover an “approximation” x* of x • Sparsity parameter k• Want x* such that ||x*-x||p C(k) ||x’-x||q ( lp/lq guarantee ) over all x’ that are k-sparse (at most k non-zero entries)• The best x* contains k coordinates of x with the largest abs
value
• Want:– Good compression (small m)– Efficient algorithms for encoding and recovery
• Why linear compression ?
=Ax
Ax
0
1
2
3
4
5
6
7
8
0
1
2
3
4
5
6
7
8
x
x*
k=2
![Page 4: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/4.jpg)
Linear compression: applications• Data stream algorithms (e.g. for network monitoring)
– Efficient increments: A(x+) = Ax + A
• Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly,
Baraniuk’06]
• Pooling, Microarray Experiments [Kainkaryam, Woolf], [Hassibi et al], [Dai-Sheikh, Milenkovic, Baraniuk]
sour
ce
destination
![Page 5: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/5.jpg)
Constructing matrix A• Choose encoding matrix A at random (the algorithms for recovering x* are more complex)
– Sparse matrices: • Data stream algorithms• Coding theory (LDPCs)
– Dense matrices: • Compressed sensing • Complexity/learning theory (Fourier matrices)
• “Traditional” tradeoffs:– Sparse: computationally more efficient, explicit– Dense: shorter sketches
• Goal: the “best of both worlds”
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 6: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/6.jpg)
Prior and New Results Paper Rand.
/ Det.
Sketch length
Encode
time
Column
sparsity
Recovery time Approx
![Page 7: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/7.jpg)
Prior and New Results Paper R/
DSketch length Encode
time
Column
sparsity
Recovery time Approx
[CCF’02],
[CM’06]
R k log n n log n log n n log n l2 / l2
R k logc n n logc n logc n k logc n l2 / l2
[CM’04] R k log n n log n log n n log n l1 / l1
R k logc n n logc n logc n k logc n l1 / l1
[CRT’04]
[RV’05]
D k log(n/k) nk log(n/k) k log(n/k) nc l2 / l1
D k logc n n log n k logc n nc l2 / l1
[GSTV’06]
[GSTV’07]
D k logc n n logc n logc n k logc n l1 / l1
D k logc n n logc n k logc n k2 logc n l2 / l1
[BGIKS’08] D k log(n/k) n log(n/k) log(n/k) nc l1 / l1
[GLR’08] D k lognlogloglogn kn1-a n1-a nc l2 / l1
[NV’07], [DM’08], [NT’08], [BD’08], [GK’09], …
D k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) * T l2 / l1
D k logc n n log n k logc n n log n * T l2 / l1
[IR’08], [BIR’08] D k log(n/k) n log(n/k) log(n/k) n log(n/k) l1 / l1
ExcellentScale: Very Good Good Fair
D k log(n/k) n log(n/k) log(n/k) n log(n/k) * T l1 / l1
[BI’09] D k log(n/k) n log(n/k) log(n/k) nlog(n/k)*T*stuff l1 / l1
“stateof art”
[GLSP’09] R k log(n/k) n logc n logc n k logc n l2 / l1
![Page 8: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/8.jpg)
Recovery “in principle” (when is a matrix “good”)
![Page 9: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/9.jpg)
• Restricted Isometry Property (RIP) * - sufficient property of a dense matrix A:
is k-sparse ||||2 ||A||2 C ||||2• Holds w.h.p. for:
– Random Gaussian/Bernoulli: m=O(k log (n/k))– Random Fourier: m=O(k logO(1) n)
• Consider random m x n 0-1 matrices with d ones per column
• Do they satisfy RIP ?– No, unless m=(k2) [Chandar’07]
• However, they can satisfy the following RIP-1 property [Berinde-Gilbert-Indyk-Karloff-Strauss’08]:
is k-sparse d (1-) ||||1 ||A||1 d||||1• Sufficient (and necessary) condition: the underlying graph is a
( k, d(1-/2) )-expander
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.dense vs. sparse
* Other (weaker) conditions exist, e.g., “kernel of A is a near-Euclidean subspace of l1”. More later.
![Page 10: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/10.jpg)
Expanders
• A bipartite graph is a (k,d(1-))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1-)d |S|
• Constructions:– Randomized: m=O(k log (n/k))– Explicit: m=k quasipolylog n
• Plenty of applications in computer science, coding theory etc.
• In particular, LDPC-like techniques yield good algorithms for exactly k-sparse vectors x
[Xu-Hassibi’07, Indyk’08, Jafarpour-Xu-Hassibi-Calderbank’08]
n
m
dS
N(S)
![Page 11: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/11.jpg)
Proof: d(1-/2)-expansion RIP-1 • Want to show that for any k-sparse we have
d (1-) ||||1 ||A ||1 d||||1• RHS inequality holds for any • LHS inequality:
– W.l.o.g. assume
|1|≥… ≥|k| ≥ |k+1|=…= |n|=0– Consider the edges e=(i,j) in a lexicographic
order– For each edge e=(i,j) define r(e) s.t.
• r(e)=-1 if there exists an edge (i’,j)<(i,j)• r(e)=1 if there is no such edge
• Claim 1: ||A||1 ≥∑e=(i,j) |i|re
• Claim 2: ∑e=(i,j) |i|re ≥ (1-) d||||1
n
m
d
![Page 12: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/12.jpg)
Recovery: algorithms
![Page 13: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/13.jpg)
Matching Pursuit(s)
• Iterative algorithm: given current approximation x* :– Find (possibly several) i s. t. Ai “correlates” with Ax-Ax* . This
yields i and z s. t.
||x*+zei-x||p << ||x* - x||p– Update x*– Sparsify x* (keep only k largest entries)– Repeat
• Norms:– p=2 : CoSaMP, SP, IHT etc (dense matrices)– p=1 : SMP, this paper (sparse matrices) – p=0 : LDPC bit flipping (sparse matrices)
=
ii
A x*-x Ax-Ax*
![Page 14: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/14.jpg)
Sequential Sparse Matching Pursuit
• Algorithm:– x*=0– Repeat T times
• Repeat S=O(k) times– Find i and z that minimize* ||A(x*+zei)-b||1– x* = x*+zei
• Sparsify x* (set all but k largest entries of x* to 0)
• Similar to SMP, but updates done sequentially
A
iN(i)
x-x*
Ax-Ax*
* Set z=median[ (Ax*-Ax)N(i) ]. Instead, one could first optimize (gradient) i and then z [ Fuchs’09]
![Page 15: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/15.jpg)
SSMP: Running time • Algorithm:
– x*=0– Repeat T times
• For each i=1…n compute* zi that achievesDi=minz ||A(x*+zei)-b||1
and store Di in a heap
• Repeat S=O(k) times– Pick i,z that yield the best gain– Update x* = x*+zei
– Recompute and store Di’ for all i’ such that N(i) and N(i’) intersect
• Sparsify x* (set all but k largest entries of x* to 0)
• Running time:T [ n(d+log n) + k nd/m*d (d+log n)]
= T [ n(d+log n) + nd (d+log n)] = T [ nd (d+log n)]
A
i
x-x*
Ax-Ax*
![Page 16: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/16.jpg)
SSMP: Approximation guarantee
• Want to find k-sparse x* that minimizes ||x-x*||1
• By RIP1, this is approximately the same as minimizing ||Ax-Ax*||1
• Need to show we can do it greedily
a1a2
x
a1
a2
x
Supports of a1 and a2 have small overlap (typically)
![Page 17: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/17.jpg)
Experiments
256x256
SSMP is ran with S=10000,T=20. SMP is ran for 100 iterations. Matrix sparsity is d=8.
![Page 18: Sparse Recovery Using Sparse (Random) Matrices Piotr Indyk MIT Joint work with: Radu Berinde, Anna Gilbert, Howard Karloff, Martin Strauss and Milan Ruzic](https://reader030.vdocument.in/reader030/viewer/2022032516/56649c785503460f9492dd9d/html5/thumbnails/18.jpg)
Conclusions• Even better algorithms for sparse approximation
(using sparse matrices)• State of the art: can do 2 out of 3:
– Near-linear encoding/decoding– O(k log (n/k)) measurements– Approximation guarantee with respect to L2/L1 norm
• Questions:– 3 out of 3 ?
• “Kernel of A is a near-Euclidean subspace of l1” [Kashin et al]• Constructions of sparse A exist [Guruswami-Lee-Razborov, Guruswami-Lee-
Wigderson]• Challenge: match the optimal measurement bound
– Explicit constructions
This talk
Thanks!