a nearly-linear time framework for graph … nearly-linear time framework for graph-structured...
TRANSCRIPT
![Page 1: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/1.jpg)
A Nearly-Linear Time Framework forGraph-Structured Sparsity
Chinmay Hegde Piotr Indyk Ludwig Schmidt
MIT
6 July 2015
ICML
Authors ordered alphabetically.1 / 22
![Page 2: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/2.jpg)
Structured sparsitySparsity is widely used in signal processing, machine learning, andstatistics (compressive sensing, sparse linear regression, etc.)
Examples of sparsity
In many cases, there is rich structure in addition to sparsity.
→ How can we exploit this prior information?
2 / 22
![Page 3: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/3.jpg)
Structured sparsitySparsity is widely used in signal processing, machine learning, andstatistics (compressive sensing, sparse linear regression, etc.)
Examples of sparsity
In many cases, there is rich structure in addition to sparsity.
→ How can we exploit this prior information?
2 / 22
![Page 4: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/4.jpg)
Structured sparsitySparsity is widely used in signal processing, machine learning, andstatistics (compressive sensing, sparse linear regression, etc.)
Examples of sparsity
In many cases, there is rich structure in addition to sparsity.
→ How can we exploit this prior information?
2 / 22
![Page 5: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/5.jpg)
Structured sparsitySparsity is widely used in signal processing, machine learning, andstatistics (compressive sensing, sparse linear regression, etc.)
Examples of sparsity
Cluster sparsity Tree sparsity Group sparsity
In many cases, there is rich structure in addition to sparsity.
→ How can we exploit this prior information?
2 / 22
![Page 6: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/6.jpg)
Our focus: stable sparse recovery
Goal: Estimate an unknown, sparse vector β ∈ Rd from observationsof the form
y = Xβ + e .
X ∈ Rn×d is the design / measurement matrix.
y ∈ Rn are the observations / measurements.
e ∈ Rn is an observation noise vector.
We are interested in the regime n d (i.e., X is a fat matrix).
→ Use structured sparsity to reduce sample complexity n.
3 / 22
![Page 7: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/7.jpg)
Utilizing structured sparsity in sparse recoveryLarge body of work: [Yuan, Lin, 2006], [Eldar, Mishali, 2009], [Jacob, Obozinski,Vert, 2009], [Baraniuk, Cevher, Duarte, Hegde, 2010], [Kim, Xing, 2010], [Bi, Kwok,2011], [Huang, Zhang, Metaxas, 2011], [Bach, Jenatton, Mairal, Obozinski, 2012b],[Rao, Recht, Nowak, 2012], [Negahban, Ravikumar, Wainwright, Yu, 2012], [Simon,Friedman, Hastie, Tibshirani, 2013], [El Halabi, Cevher, 2015] etc.
Surveys [Bach, Jenatton, Mairal, Obozinski, 2012a] and [Wainwright, 2014].
Main goals:GeneralityWhat sparsity structures does the approach apply to?
Generalize several previously studied sparsity models.
Statistical efficiencyWhat is the statistical performance improvement?
Asymptotically optimal sample complexity.
Computational efficiencyHow fast are the resulting algorithms?
Nearly-linear time algorithms.
4 / 22
![Page 8: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/8.jpg)
Utilizing structured sparsity in sparse recoveryLarge body of work: [Yuan, Lin, 2006], [Eldar, Mishali, 2009], [Jacob, Obozinski,Vert, 2009], [Baraniuk, Cevher, Duarte, Hegde, 2010], [Kim, Xing, 2010], [Bi, Kwok,2011], [Huang, Zhang, Metaxas, 2011], [Bach, Jenatton, Mairal, Obozinski, 2012b],[Rao, Recht, Nowak, 2012], [Negahban, Ravikumar, Wainwright, Yu, 2012], [Simon,Friedman, Hastie, Tibshirani, 2013], [El Halabi, Cevher, 2015] etc.
Surveys [Bach, Jenatton, Mairal, Obozinski, 2012a] and [Wainwright, 2014].
Main goals:GeneralityWhat sparsity structures does the approach apply to?
Generalize several previously studied sparsity models.
Statistical efficiencyWhat is the statistical performance improvement?
Asymptotically optimal sample complexity.
Computational efficiencyHow fast are the resulting algorithms?
Nearly-linear time algorithms.
4 / 22
![Page 9: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/9.jpg)
Utilizing structured sparsity in sparse recoveryLarge body of work: [Yuan, Lin, 2006], [Eldar, Mishali, 2009], [Jacob, Obozinski,Vert, 2009], [Baraniuk, Cevher, Duarte, Hegde, 2010], [Kim, Xing, 2010], [Bi, Kwok,2011], [Huang, Zhang, Metaxas, 2011], [Bach, Jenatton, Mairal, Obozinski, 2012b],[Rao, Recht, Nowak, 2012], [Negahban, Ravikumar, Wainwright, Yu, 2012], [Simon,Friedman, Hastie, Tibshirani, 2013], [El Halabi, Cevher, 2015] etc.
Surveys [Bach, Jenatton, Mairal, Obozinski, 2012a] and [Wainwright, 2014].
Main goals:GeneralityWhat sparsity structures does the approach apply to?Generalize several previously studied sparsity models.
Statistical efficiencyWhat is the statistical performance improvement?
Asymptotically optimal sample complexity.
Computational efficiencyHow fast are the resulting algorithms?
Nearly-linear time algorithms.
4 / 22
![Page 10: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/10.jpg)
Utilizing structured sparsity in sparse recoveryLarge body of work: [Yuan, Lin, 2006], [Eldar, Mishali, 2009], [Jacob, Obozinski,Vert, 2009], [Baraniuk, Cevher, Duarte, Hegde, 2010], [Kim, Xing, 2010], [Bi, Kwok,2011], [Huang, Zhang, Metaxas, 2011], [Bach, Jenatton, Mairal, Obozinski, 2012b],[Rao, Recht, Nowak, 2012], [Negahban, Ravikumar, Wainwright, Yu, 2012], [Simon,Friedman, Hastie, Tibshirani, 2013], [El Halabi, Cevher, 2015] etc.
Surveys [Bach, Jenatton, Mairal, Obozinski, 2012a] and [Wainwright, 2014].
Main goals:GeneralityWhat sparsity structures does the approach apply to?Generalize several previously studied sparsity models.
Statistical efficiencyWhat is the statistical performance improvement?Asymptotically optimal sample complexity.
Computational efficiencyHow fast are the resulting algorithms?
Nearly-linear time algorithms.
4 / 22
![Page 11: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/11.jpg)
Utilizing structured sparsity in sparse recoveryLarge body of work: [Yuan, Lin, 2006], [Eldar, Mishali, 2009], [Jacob, Obozinski,Vert, 2009], [Baraniuk, Cevher, Duarte, Hegde, 2010], [Kim, Xing, 2010], [Bi, Kwok,2011], [Huang, Zhang, Metaxas, 2011], [Bach, Jenatton, Mairal, Obozinski, 2012b],[Rao, Recht, Nowak, 2012], [Negahban, Ravikumar, Wainwright, Yu, 2012], [Simon,Friedman, Hastie, Tibshirani, 2013], [El Halabi, Cevher, 2015] etc.
Surveys [Bach, Jenatton, Mairal, Obozinski, 2012a] and [Wainwright, 2014].
Main goals:GeneralityWhat sparsity structures does the approach apply to?Generalize several previously studied sparsity models.
Statistical efficiencyWhat is the statistical performance improvement?Asymptotically optimal sample complexity.
Computational efficiencyHow fast are the resulting algorithms?Nearly-linear time algorithms.
4 / 22
![Page 12: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/12.jpg)
Generality
The Weighted Graph Model (WGM)
5 / 22
![Page 13: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/13.jpg)
Structured sparsity modelsModeling approach: restrict the set of allowed supports.[Baraniuk, Cevher, Duarte, Hegde, 2010]
So far: β is a vector.
β1
β2
β3
β4
β5
β6
β7
β8
Now: β corresponds to a graph.
β7 β8
β2
β5
β6
β1
β4
β3
Restrict size and number of connected components of supports.
6 / 22
![Page 14: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/14.jpg)
Structured sparsity modelsModeling approach: restrict the set of allowed supports.[Baraniuk, Cevher, Duarte, Hegde, 2010]
So far: β is a vector.
β1
β2
β3
β4
β5
β6
β7
β8
Now: β corresponds to a graph.
β7 β8
β2
β5
β6
β1
β4
β3
Restrict size and number of connected components of supports.6 / 22
![Page 15: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/15.jpg)
Weighted Graph Model (simplified)Parameters
Graph G = ([d ],E) defined on the index set [d ].Sparsity s.Number of connected components g.
Examples for s = 3 and g = 2:
In the model
Not in the model
In the model
Not in the model
7 / 22
![Page 16: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/16.jpg)
Weighted Graph Model (simplified)Parameters
Graph G = ([d ],E) defined on the index set [d ].Sparsity s.Number of connected components g.
Examples for s = 3 and g = 2:
In the model
Not in the model
In the model
Not in the model 7 / 22
![Page 17: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/17.jpg)
Generality
We can encode several sparsity structures via the graph G.
No edges: standard s-sparsity
Tree: hierarchical / tree sparsity
(Almost) line graph: block sparsity
Grid graph: 2D cluster sparsity
8 / 22
![Page 18: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/18.jpg)
Generality
We can encode several sparsity structures via the graph G.
No edges: standard s-sparsity
Tree: hierarchical / tree sparsity
(Almost) line graph: block sparsity
Grid graph: 2D cluster sparsity
8 / 22
![Page 19: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/19.jpg)
Generality
We can encode several sparsity structures via the graph G.
No edges: standard s-sparsity
Tree: hierarchical / tree sparsity
(Almost) line graph: block sparsity
Grid graph: 2D cluster sparsity
8 / 22
![Page 20: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/20.jpg)
Generality
We can encode several sparsity structures via the graph G.
No edges: standard s-sparsity
Tree: hierarchical / tree sparsity
(Almost) line graph: block sparsity
Grid graph: 2D cluster sparsity
8 / 22
![Page 21: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/21.jpg)
Weighted Graph Model (full version)Our structured sparsity model also supports edge weights.
Additional parameter: B, bound on the sum of weights in the support.
E.g., s = 3, g = 2, and B = 5:
1
2
310
56
7
89
4
11
In the model
1
2
310
56
7
89
4
11
Not in the model
Allows further generalizations, e.g., encoding the EMD-model(a model for correlated supports in adjacent columns).
9 / 22
![Page 22: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/22.jpg)
Weighted Graph Model (full version)Our structured sparsity model also supports edge weights.
Additional parameter: B, bound on the sum of weights in the support.
E.g., s = 3, g = 2, and B = 5:
1
2
310
56
7
89
4
11
In the model
1
2
310
56
7
89
4
11
Not in the model
Allows further generalizations, e.g., encoding the EMD-model(a model for correlated supports in adjacent columns).
9 / 22
![Page 23: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/23.jpg)
Statistical efficiency
Sample complexity of sparse recovery with the WGM
10 / 22
![Page 24: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/24.jpg)
Cardinality of the WGMKey quantity: |M|, the number of allowed supports in the WGM.
→ Counting argument: how many subgraphs with size s and gconnected components does G contain?
|M| depends on the graph G and the parameters s and g.
Useful graph parameter: ρ(G), the maximum degree of a node in G.
ρ(G) = 4
11 / 22
![Page 25: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/25.jpg)
Cardinality of the WGMKey quantity: |M|, the number of allowed supports in the WGM.
→ Counting argument: how many subgraphs with size s and gconnected components does G contain?
|M| depends on the graph G and the parameters s and g.
Useful graph parameter: ρ(G), the maximum degree of a node in G.
ρ(G) = 4
11 / 22
![Page 26: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/26.jpg)
Sample complexity
Let β ∈ Rd be in the (G, s,g,B)-weighted graph model. Then
n = O(
s(
log ρ(G) + logBs
)+ g · log
dg
)i.i.d. Gaussian observations suffice to find an estimate β such that∥∥β − β∥∥ ≤ C ‖e‖ .
Unweighted case: n = O(
s log ρ(G) + g · log dg
)
“Standard” stable sparse recovery: n = O(
s · log dg
).
Asymptotically optimal sample complexity n = O(s) forBlock sparsity.Tree sparsity.Cluster sparsity in constant-degree graphs (for g = O(s/ log d)).
12 / 22
![Page 27: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/27.jpg)
Sample complexity
Let β ∈ Rd be in the (G, s,g,B)-weighted graph model. Then
n = O(
s(
log ρ(G) + logBs
)+ g · log
dg
)i.i.d. Gaussian observations suffice to find an estimate β such that∥∥β − β∥∥ ≤ C ‖e‖ .
Unweighted case: n = O(
s log ρ(G) + g · log dg
)“Standard” stable sparse recovery: n = O
(s · log d
g
).
Asymptotically optimal sample complexity n = O(s) forBlock sparsity.Tree sparsity.Cluster sparsity in constant-degree graphs (for g = O(s/ log d)).
12 / 22
![Page 28: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/28.jpg)
Sample complexity
Let β ∈ Rd be in the (G, s,g,B)-weighted graph model. Then
n = O(
s(
log ρ(G) + logBs
)+ g · log
dg
)i.i.d. Gaussian observations suffice to find an estimate β such that∥∥β − β∥∥ ≤ C ‖e‖ .
Unweighted case: n = O(
s log ρ(G) + g · log dg
)“Standard” stable sparse recovery: n = O
(s · log d
g
).
Asymptotically optimal sample complexity n = O(s) forBlock sparsity.Tree sparsity.Cluster sparsity in constant-degree graphs (for g = O(s/ log d)).
12 / 22
![Page 29: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/29.jpg)
Computational efficiency
Nearly-linear time model projection for the WGM
13 / 22
![Page 30: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/30.jpg)
Model projection
Goal: Given b ∈ Rd and a sparsity model M, find
Ω∗ = arg minΩ∈M
‖b − bΩ‖ .
For the (G, s,g)-WGM: Find the subgraph G with size s and gconnected components that maximizes the sum of node weights.
3 5
7
2
6
8
10
This problem is NP-hard.
14 / 22
![Page 31: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/31.jpg)
Model projection
Goal: Given b ∈ Rd and a sparsity model M, find
Ω∗ = arg minΩ∈M
‖b − bΩ‖ .
For the (G, s,g)-WGM: Find the subgraph G with size s and gconnected components that maximizes the sum of node weights.
3 5
7
2
6
8
10
This problem is NP-hard.
14 / 22
![Page 32: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/32.jpg)
Model projection
Goal: Given b ∈ Rd and a sparsity model M, find
Ω∗ = arg minΩ∈M
‖b − bΩ‖ .
For the (G, s,g)-WGM: Find the subgraph G with size s and gconnected components that maximizes the sum of node weights.
3 5
7
2
6
8
10
This problem is NP-hard.
14 / 22
![Page 33: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/33.jpg)
Model projection
Goal: Given b ∈ Rd and a sparsity model M, find
Ω∗ = arg minΩ∈M
‖b − bΩ‖ .
For the (G, s,g)-WGM: Find the subgraph G with size s and gconnected components that maximizes the sum of node weights.
3 5
7
2
6
8
10
3 5
7
2
6
8
10
This problem is NP-hard.
14 / 22
![Page 34: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/34.jpg)
Model projection
Goal: Given b ∈ Rd and a sparsity model M, find
Ω∗ = arg minΩ∈M
‖b − bΩ‖ .
For the (G, s,g)-WGM: Find the subgraph G with size s and gconnected components that maximizes the sum of node weights.
3 5
7
2
6
8
10
3 5
7
2
6
8
10
This problem is NP-hard.14 / 22
![Page 35: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/35.jpg)
Approximation to the rescue!Approximation-tolerant model-based sparse recovery [HIS’14].→ Approximate projections suffice, but two types are necessary.
Tail-approximation oracle T (b)
Find a support Ω ∈M such that
‖b − bΩ‖ ≤ cT · minΩ′∈M
‖b − bΩ′‖ .
Head-approximation oracle H(b)
Find a support Ω ∈M such that
‖bΩ‖ ≥ cH · maxΩ′∈M
‖bΩ′‖ .
head: bΩ tail: b − bΩ
minimize
head: bΩ
maximizetail: b − bΩ
15 / 22
![Page 36: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/36.jpg)
Approximation to the rescue!Approximation-tolerant model-based sparse recovery [HIS’14].→ Approximate projections suffice, but two types are necessary.
Tail-approximation oracle T (b)
Find a support Ω ∈M such that
‖b − bΩ‖ ≤ cT · minΩ′∈M
‖b − bΩ′‖ .
Head-approximation oracle H(b)
Find a support Ω ∈M such that
‖bΩ‖ ≥ cH · maxΩ′∈M
‖bΩ′‖ .
head: bΩ tail: b − bΩ
minimize
head: bΩ
maximizetail: b − bΩ
15 / 22
![Page 37: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/37.jpg)
Approximation to the rescue!Approximation-tolerant model-based sparse recovery [HIS’14].→ Approximate projections suffice, but two types are necessary.
Tail-approximation oracle T (b)
Find a support Ω ∈M such that
‖b − bΩ‖ ≤ cT · minΩ′∈M
‖b − bΩ′‖ .
Head-approximation oracle H(b)
Find a support Ω ∈M such that
‖bΩ‖ ≥ cH · maxΩ′∈M
‖bΩ′‖ .
head: bΩ tail: b − bΩ
minimize
head: bΩ
maximizetail: b − bΩ
15 / 22
![Page 38: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/38.jpg)
The prize-collecting Steiner tree problem (PCST)Generalization of the classical Steiner tree problem.
Goal: Given a graph with edge costs c and node prizes π, find asubtree T minimizing c(T ) + π(T ) (T : nodes not in T ).
1
2
34
56
7
89
10
11
The Goemans-Williamson (GW) scheme produces a tree T with
c(T ) + 2π(T ) ≤ 2 minT ′is a tree
c(T ′) + π(T ′)
and runs in time O(|V |2 log|V |) [Goemans, Williamson, 1995].
16 / 22
![Page 39: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/39.jpg)
The prize-collecting Steiner tree problem (PCST)Generalization of the classical Steiner tree problem.
Goal: Given a graph with edge costs c and node prizes π, find asubtree T minimizing c(T ) + π(T ) (T : nodes not in T ).
1
2
34
56
7
89
10
11
The Goemans-Williamson (GW) scheme produces a tree T with
c(T ) + 2π(T ) ≤ 2 minT ′is a tree
c(T ′) + π(T ′)
and runs in time O(|V |2 log|V |) [Goemans, Williamson, 1995].
16 / 22
![Page 40: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/40.jpg)
The prize-collecting Steiner tree problem (PCST)Generalization of the classical Steiner tree problem.
Goal: Given a graph with edge costs c and node prizes π, find asubtree T minimizing c(T ) + π(T ) (T : nodes not in T ).
7 6
2
5
4
1
83
1
2
34
56
7
89
10
11
The Goemans-Williamson (GW) scheme produces a tree T with
c(T ) + 2π(T ) ≤ 2 minT ′is a tree
c(T ′) + π(T ′)
and runs in time O(|V |2 log|V |) [Goemans, Williamson, 1995].
16 / 22
![Page 41: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/41.jpg)
The prize-collecting Steiner tree problem (PCST)Generalization of the classical Steiner tree problem.
Goal: Given a graph with edge costs c and node prizes π, find asubtree T minimizing c(T ) + π(T ) (T : nodes not in T ).
7 6
2
5
4
1
83
1
2
34
56
7
89
10
11
The Goemans-Williamson (GW) scheme produces a tree T with
c(T ) + 2π(T ) ≤ 2 minT ′is a tree
c(T ′) + π(T ′)
and runs in time O(|V |2 log|V |) [Goemans, Williamson, 1995].
16 / 22
![Page 42: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/42.jpg)
The prize-collecting Steiner tree problem (PCST)Generalization of the classical Steiner tree problem.
Goal: Given a graph with edge costs c and node prizes π, find asubtree T minimizing c(T ) + π(T ) (T : nodes not in T ).
7 6
2
5
4
1
83
1
2
34
56
7
89
10
11
The Goemans-Williamson (GW) scheme produces a tree T with
c(T ) + 2π(T ) ≤ 2 minT ′is a tree
c(T ′) + π(T ′)
and runs in time O(|V |2 log|V |) [Goemans, Williamson, 1995].16 / 22
![Page 43: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/43.jpg)
Our algorithmic contributions1 Generalize GW to the prize-collecting Steiner forest problem.
We find a forest F with g components such that:
c(F ) + 2π(F ) ≤ 2 minF ′ has g components
c(F ′) + π(F ′)
2 Give a nearly-linear time and practical variant of GW.
Building on the dynamic edge splitting idea introduced in[Cole, Hariharan, Lewenstein, Porat, 2001].
a b
3 Reduce WGM-projection to a sequence of PCSF problems.
Lagrangian relaxation + binary search and graph post-processing.
17 / 22
![Page 44: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/44.jpg)
Our algorithmic contributions1 Generalize GW to the prize-collecting Steiner forest problem.
We find a forest F with g components such that:
c(F ) + 2π(F ) ≤ 2 minF ′ has g components
c(F ′) + π(F ′)
2 Give a nearly-linear time and practical variant of GW.
Building on the dynamic edge splitting idea introduced in[Cole, Hariharan, Lewenstein, Porat, 2001].
a b
3 Reduce WGM-projection to a sequence of PCSF problems.
Lagrangian relaxation + binary search and graph post-processing.
17 / 22
![Page 45: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/45.jpg)
Our algorithmic contributions1 Generalize GW to the prize-collecting Steiner forest problem.
We find a forest F with g components such that:
c(F ) + 2π(F ) ≤ 2 minF ′ has g components
c(F ′) + π(F ′)
2 Give a nearly-linear time and practical variant of GW.
Building on the dynamic edge splitting idea introduced in[Cole, Hariharan, Lewenstein, Porat, 2001].
a b
3 Reduce WGM-projection to a sequence of PCSF problems.
Lagrangian relaxation + binary search and graph post-processing.17 / 22
![Page 46: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/46.jpg)
Running time
TheoremOn a graph with |E | edges and d nodes, GRAPH-COSAMP runs in time
O(
(TX + |E | log3 d) log d),
where TX is the cost of a matrix-vector multiplication with the design /measurement matrix X .
Model Reference Previous time Our time
1D-cluster [CIHB09] O(d log2 d) O(d log4 d)
Trees [HIS14a] O(d log2 d) O(d log4 d)
EMD [HIS14b] O(d2 log d) O(d3/2 log4 d)
Graph clusters [HZM11] O(dc) O(d log4 d)
18 / 22
![Page 47: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/47.jpg)
Running time
TheoremOn a graph with |E | edges and d nodes, GRAPH-COSAMP runs in time
O(
(TX + |E | log3 d) log d),
where TX is the cost of a matrix-vector multiplication with the design /measurement matrix X .
Model Reference Previous time Our time
1D-cluster [CIHB09] O(d log2 d) O(d log4 d)
Trees [HIS14a] O(d log2 d) O(d log4 d)
EMD [HIS14b] O(d2 log d) O(d3/2 log4 d)
Graph clusters [HZM11] O(dc) O(d log4 d)
18 / 22
![Page 48: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/48.jpg)
Experiments
19 / 22
![Page 49: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/49.jpg)
Sparse recovery experiments
2 3 4 5 6 70
0.2
0.4
0.6
0.8
1
Oversampling ratio n/s
Pro
babi
lity
ofre
cove
ry
2 3 4 5 6 70
0.2
0.4
0.6
0.8
1
Oversampling ratio n/s
Pro
babi
lity
ofre
cove
ry
Graph-CoSaMP StructOMP LaMP CoSaMP Basis Pursuit
2 3 4 5 6 70
0.2
0.4
0.6
0.8
1
Oversampling ratio n/s
Pro
babi
lity
ofre
cove
ry
StructOMP: [HZM11], LaMP: [CDHB09], CoSaMP: [NT09], BP: [CD92]. 20 / 22
![Page 50: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/50.jpg)
Running timesAngiogram image, n = 6s observations, subsampled Fourier matrix.
0 1 2 3 4·104
0
20
40
60
80
100
Problem size d
Rec
over
ytim
e(s
ec)
0 1 2 3 4·104
10−2
10−1
100
101
102
Problem size d
Rec
over
ytim
e(s
ec)
Graph-CoSaMP StructOMP LaMP CoSaMP Basis Pursuit
Graph-CoSaMP is about 20× faster than StructOMP for d = 104
and scales nearly-linearly.
Constant factor: solving more than 20 PCSF instances per recovery.21 / 22
![Page 51: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/51.jpg)
ConclusionsFurther applications, e.g. in seismicimage processing.
We introduced the Weighted Graph Model.Generalizes several structuredsparsity models.
Asymptotically optimal samplecomplexity in many cases.
Nearly-linear time approximate modelprojections.
Open problems / future directionsFast measurement matrix for allsparsity levels.Recovery guarantees beyond RIP.Learning sparsity models.
Noisy input Human labels Automatic
22 / 22
![Page 52: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/52.jpg)
ConclusionsFurther applications, e.g. in seismicimage processing.
We introduced the Weighted Graph Model.Generalizes several structuredsparsity models.
Asymptotically optimal samplecomplexity in many cases.
Nearly-linear time approximate modelprojections.
Open problems / future directionsFast measurement matrix for allsparsity levels.Recovery guarantees beyond RIP.Learning sparsity models.
Noisy input Human labels Automatic
22 / 22
![Page 53: A Nearly-Linear Time Framework for Graph … Nearly-Linear Time Framework for Graph-Structured Sparsity ... Metaxas, 2011], [Bach, Jenatton, Mairal ... [Baraniuk, Cevher, Duarte, Hegde,](https://reader031.vdocument.in/reader031/viewer/2022022508/5acc38527f8b9a73128c975e/html5/thumbnails/53.jpg)
ConclusionsFurther applications, e.g. in seismicimage processing.
We introduced the Weighted Graph Model.Generalizes several structuredsparsity models.
Asymptotically optimal samplecomplexity in many cases.
Nearly-linear time approximate modelprojections.
Open problems / future directionsFast measurement matrix for allsparsity levels.Recovery guarantees beyond RIP.Learning sparsity models.
Noisy input Human labels Automatic
22 / 22