constrained optimization - nyu courantcfgranda/pages/obda_fall17/slides... · 2017. 12. 11. ·...
TRANSCRIPT
![Page 1: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/1.jpg)
Constrained optimization
DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysishttp://www.cims.nyu.edu/~cfgranda/pages/OBDA_fall17/index.html
Carlos Fernandez-Granda
![Page 2: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/2.jpg)
Compressed sensing
Convex constrained problems
Analyzing optimization-based methods
![Page 3: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/3.jpg)
Magnetic resonance imaging
2D DFT (magnitude) 2D DFT (log. of magnitude)
![Page 4: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/4.jpg)
Magnetic resonance imaging
Data: Samples from spectrum
Problem: Sampling is time consuming (annoying, kids move . . . )
Images are compressible (sparse in wavelet basis)
Can we recover compressible signals from less data?
![Page 5: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/5.jpg)
2D wavelet transform
![Page 6: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/6.jpg)
2D wavelet transform
![Page 7: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/7.jpg)
Full DFT matrix
![Page 8: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/8.jpg)
Full DFT matrix
![Page 9: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/9.jpg)
Regular subsampling
![Page 10: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/10.jpg)
Regular subsampling
![Page 11: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/11.jpg)
Random subsampling
![Page 12: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/12.jpg)
Random subsampling
![Page 13: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/13.jpg)
Toy example
![Page 14: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/14.jpg)
Regular subsampling
![Page 15: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/15.jpg)
Random subsampling
![Page 16: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/16.jpg)
Linear inverse problems
Linear inverse problem
A~x = ~y
Linear measurements, A ∈ Rm×n
~y [i ] = 〈Ai :, ~x〉 , 1 ≤ i ≤ n,
Aim: Recover data signal ~x ∈ Rm from data ~y ∈ Rn
We need n ≥ m, otherwise the problem is underdetermined
If n < m there are infinite solutions ~x + ~w where ~w ∈ null (A)
![Page 17: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/17.jpg)
Sparse recovery
Aim: Recover sparse ~x from linear measurements
A~x = ~y
When is the problem well posed?
There shouldn’t be two sparse vectors ~x1 and ~x2 such that A~x1 = A~x2
![Page 18: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/18.jpg)
Spark
The spark of a matrix is the smallest subset of columns that is linearlydependent
Let ~y := A~x ∗, where A ∈ Rm×n, ~y ∈ Rn and ~x ∗ ∈ Rm is a sparse vectorwith s nonzero entries
The vector ~x ∗ is the only vector with sparsity s consistent with the data,i.e. it is the solution of
min~x||~x ||0 subject to A~x = ~y
for any choice of ~x ∗ if and only if
spark (A) > 2s
![Page 19: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/19.jpg)
Spark
The spark of a matrix is the smallest subset of columns that is linearlydependent
Let ~y := A~x ∗, where A ∈ Rm×n, ~y ∈ Rn and ~x ∗ ∈ Rm is a sparse vectorwith s nonzero entries
The vector ~x ∗ is the only vector with sparsity s consistent with the data,i.e. it is the solution of
min~x||~x ||0 subject to A~x = ~y
for any choice of ~x ∗ if and only if
spark (A) > 2s
![Page 20: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/20.jpg)
Proof
Equivalent statements
I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata
I For any pair of s-sparse vectors ~x1 and ~x2
A (~x1 − ~x2) 6= ~0
I For any pair of subsets of s indices T1 and T2
AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|
I All submatrices with at most 2s columns have no nonzero vectors intheir null space
I All submatrices with at most 2s columns are full rank
![Page 21: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/21.jpg)
Proof
Equivalent statements
I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata
I For any pair of s-sparse vectors ~x1 and ~x2
A (~x1 − ~x2) 6= ~0
I For any pair of subsets of s indices T1 and T2
AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|
I All submatrices with at most 2s columns have no nonzero vectors intheir null space
I All submatrices with at most 2s columns are full rank
![Page 22: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/22.jpg)
Proof
Equivalent statements
I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata
I For any pair of s-sparse vectors ~x1 and ~x2
A (~x1 − ~x2) 6= ~0
I For any pair of subsets of s indices T1 and T2
AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|
I All submatrices with at most 2s columns have no nonzero vectors intheir null space
I All submatrices with at most 2s columns are full rank
![Page 23: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/23.jpg)
Proof
Equivalent statements
I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata
I For any pair of s-sparse vectors ~x1 and ~x2
A (~x1 − ~x2) 6= ~0
I For any pair of subsets of s indices T1 and T2
AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|
I All submatrices with at most 2s columns have no nonzero vectors intheir null space
I All submatrices with at most 2s columns are full rank
![Page 24: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/24.jpg)
Proof
Equivalent statements
I For any ~x ∗, ~x ∗ is the only vector with sparsity s consistent with thedata
I For any pair of s-sparse vectors ~x1 and ~x2
A (~x1 − ~x2) 6= ~0
I For any pair of subsets of s indices T1 and T2
AT1∪T2~α 6= ~0 for any ~α ∈ R|T1∪T2|
I All submatrices with at most 2s columns have no nonzero vectors intheir null space
I All submatrices with at most 2s columns are full rank
![Page 25: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/25.jpg)
Restricted-isometry property
Robust version of spark
If two s-sparse vectors ~x1, ~x2 are far, then A~x1, A~x2 should be far
The linear operator should preserve distances (be an isometry) whenrestricted to act upon sparse vectors
![Page 26: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/26.jpg)
Restricted-isometry property
A satisfies the restricted isometry property (RIP) with constant κs if
(1− κs) ||~x ||2 ≤ ||A~x ||2 ≤ (1+ κs) ||~x ||2
for any s-sparse vector ~x
If A satisfies the RIP for a sparsity level 2s then for any s-sparse ~x1, ~x2
||~y2 − ~y1||2
= A (~x1 − ~x2)
≥ (1− κ2s) ||~x2 − ~x1||2
![Page 27: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/27.jpg)
Restricted-isometry property
A satisfies the restricted isometry property (RIP) with constant κs if
(1− κs) ||~x ||2 ≤ ||A~x ||2 ≤ (1+ κs) ||~x ||2
for any s-sparse vector ~x
If A satisfies the RIP for a sparsity level 2s then for any s-sparse ~x1, ~x2
||~y2 − ~y1||2 = A (~x1 − ~x2)
≥ (1− κ2s) ||~x2 − ~x1||2
![Page 28: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/28.jpg)
Regular subsampling
![Page 29: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/29.jpg)
Regular subsampling
![Page 30: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/30.jpg)
Correlation with column 20
0 10 20 30 40 50 60
0.0
0.2
0.4
0.6
0.8
1.0C
orr
ela
tion
![Page 31: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/31.jpg)
Random subsampling
![Page 32: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/32.jpg)
Random subsampling
![Page 33: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/33.jpg)
Correlation with column 20
0 10 20 30 40 50 600.6
0.4
0.2
0.0
0.2
0.4
0.6
0.8
1.0C
orr
ela
tion
![Page 34: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/34.jpg)
Restricted-isometry property
Deterministic matrices tend to not satisfy the RIP
It is NP-hard to check if spark or RIP hold
Random matrices satisfy RIP with high probability
We prove it for Gaussian iid matrices, ideas in proof for random Fouriermatrices are similar
![Page 35: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/35.jpg)
Restricted-isometry property for Gaussian matrices
Let A ∈ Rm×n be a random matrix with iid standard Gaussian entries
1√m
A satisfies the RIP for a constant κs with probability 1− C2n as long as
m ≥ C1s
κ2s
log(ns
)for two fixed constants C1,C2 > 0
![Page 36: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/36.jpg)
Proof
For a fixed support T of size s bounds follow from bounds on singularvalues of Gaussian matrices
![Page 37: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/37.jpg)
Singular values of m × s matrix, s = 100
0 20 40 60 80 100
0.5
1
1.5
i
σi√m
m/s25102050100200
![Page 38: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/38.jpg)
Singular values of m × s matrix, s = 1000
0 200 400 600 800 1,000
0.5
1
1.5
i
σi√m
m/s25102050100200
![Page 39: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/39.jpg)
Proof
For a fixed submatrix the singular values are bounded by√m (1− κs) ≤ σs ≤ σ1 ≤
√m (1+ κs)
with probability at least
1− 2(12κs
)s
exp(−mκ2
s
32
)For any vector ~x with support T
√1− κs ||~x ||2 ≤
1√m||A~x ||2 ≤
√1+ κs ||~x ||2
![Page 40: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/40.jpg)
Union bound
For any events S1, S2, . . . ,Sn in a probability space
P (∪iSi ) ≤n∑
i=1
P (Si ) .
![Page 41: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/41.jpg)
Proof
Number of different supports of size s
(n
s
)≤(ens
)sBy the union bound
√1− κs ||~x ||2 ≤
1√m||A~x ||2 ≤
√1+ κs ||~x ||2
holds for any s-sparse vector ~x with probability at least
1− 2(ens
)s (12κs
)s
exp(−mκ2
s
32
)= 1− exp
(log 2+ s + s log
(ns
)+ s log
(12κs
)− mκ2
s
2
)≤ 1− C2
nas long as m ≥ C1s
κ2s
log(ns
)
![Page 42: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/42.jpg)
Proof
Number of different supports of size s(n
s
)≤(ens
)s
By the union bound
√1− κs ||~x ||2 ≤
1√m||A~x ||2 ≤
√1+ κs ||~x ||2
holds for any s-sparse vector ~x with probability at least
1− 2(ens
)s (12κs
)s
exp(−mκ2
s
32
)= 1− exp
(log 2+ s + s log
(ns
)+ s log
(12κs
)− mκ2
s
2
)≤ 1− C2
nas long as m ≥ C1s
κ2s
log(ns
)
![Page 43: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/43.jpg)
Proof
Number of different supports of size s(n
s
)≤(ens
)sBy the union bound
√1− κs ||~x ||2 ≤
1√m||A~x ||2 ≤
√1+ κs ||~x ||2
holds for any s-sparse vector ~x with probability at least
1− 2(ens
)s (12κs
)s
exp(−mκ2
s
32
)= 1− exp
(log 2+ s + s log
(ns
)+ s log
(12κs
)− mκ2
s
2
)≤ 1− C2
nas long as m ≥ C1s
κ2s
log(ns
)
![Page 44: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/44.jpg)
Sparse recovery via `1-norm minimization
`0-“norm" minimization is intractable
(As usual) we can minimize `1 norm instead, estimate ~x`1 is the solutionto
min~x||~x ||1 subject to A~x = ~y
![Page 45: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/45.jpg)
Minimum `2-norm solution (regular subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L2 Solution
![Page 46: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/46.jpg)
Minimum `1-norm solution (regular subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L1 Solution
![Page 47: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/47.jpg)
Minimum `2-norm solution (random subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L2 Solution
![Page 48: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/48.jpg)
Minimum `1-norm solution (random subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L1 Solution
![Page 49: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/49.jpg)
Geometric intuition
`2 norm `1 norm
![Page 50: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/50.jpg)
Sparse recovery via `1-norm minimization
If the signal is sparse in a transform domain then
min~x||~c ||1 subject to AW ~c = ~y
If we want to recover the original ~c ∗ then AW should satisfy the RIP
However, we might be fine with any ~c ′ such that A~c ′ = ~x ∗
![Page 51: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/51.jpg)
Sparse recovery via `1-norm minimization
If the signal is sparse in a transform domain then
min~x||~c ||1 subject to AW ~c = ~y
If we want to recover the original ~c ∗ then AW should satisfy the RIP
However, we might be fine with any ~c ′ such that A~c ′ = ~x ∗
![Page 52: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/52.jpg)
Regular subsampling
![Page 53: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/53.jpg)
Minimum `2-norm solution (regular subsampling)
![Page 54: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/54.jpg)
Minimum `1-norm solution (regular subsampling)
![Page 55: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/55.jpg)
Random subsampling
![Page 56: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/56.jpg)
Minimum `2-norm solution (random subsampling)
![Page 57: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/57.jpg)
Minimum `1-norm solution (random subsampling)
![Page 58: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/58.jpg)
Compressed sensing
Convex constrained problems
Analyzing optimization-based methods
![Page 59: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/59.jpg)
Convex sets
A convex set S is any set such that for any ~x , ~y ∈ S and θ ∈ (0, 1)
θ~x + (1− θ) ~y ∈ S
The intersection of convex sets is convex
![Page 60: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/60.jpg)
Convex vs nonconvex
Nonconvex Convex
![Page 61: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/61.jpg)
Epigraph
f
epi (f )
A function is convex if and only if its epigraph is convex
![Page 62: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/62.jpg)
Projection onto convex set
The projection of any vector ~x onto a non-empty closed convex set S
PS (~x) := argmin~y∈S||~x − ~y ||2
exists and is unique
![Page 63: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/63.jpg)
Proof
Assume there are two distinct projections ~y1 6= ~y2
Consider
~y ′ :=~y1 + ~y2
2
~y ′ belongs to S (why?)
![Page 64: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/64.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩
=14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 65: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/65.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩=
14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 66: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/66.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩=
14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 67: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/67.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩=
14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 68: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/68.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩=
14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 69: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/69.jpg)
Proof
⟨~x − ~y ′, ~y1 − ~y ′
⟩=
⟨~x −
~y1 + ~y2
2, ~y1 −
~y1 + ~y2
2
⟩=
⟨~x − ~y1
2+~x − ~y2
2,~x − ~y1
2−~x − ~y2
2
⟩=
14
(||~x − ~y1||2 + ||~x − ~y2||2
)= 0
By Pythagoras’ theorem
||~x − ~y1||22 =∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣~y1 − ~y ′
∣∣∣∣22
=∣∣∣∣~x − ~y ′∣∣∣∣22 + ∣∣∣∣∣∣∣∣~y1 − ~y2
2
∣∣∣∣∣∣∣∣22
>∣∣∣∣~x − ~y ′∣∣∣∣22
![Page 70: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/70.jpg)
Convex combination
Given n vectors ~x1, ~x2, . . . , ~xn ∈ Rn,
~x :=n∑
i=1
θi~xi
is a convex combination of ~x1, ~x2, . . . , ~xn if
θi ≥ 0, 1 ≤ i ≤ nn∑
i=1
θi = 1
![Page 71: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/71.jpg)
Convex hull
The convex hull of S is the set of convex combinations of points in S
The `1-norm ball is the convex hull of the intersection between the`0 “norm" ball and the `∞-norm ball
![Page 72: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/72.jpg)
`1-norm ball
![Page 73: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/73.jpg)
B`1 ⊆ C (B`0 ∩ B`∞)
Let ~x ∈ B`1
Set θi := |~x [i ]|, θ0 = 1−∑n
i=1 θi∑ni=0 θi = 1 by construction, θi ≥ 0 and
θ0 = 1−n+1∑i=1
θi
= 1− ||~x ||1≥ 0 because ~x ∈ B`1
~x ∈ B`0 ∩ B`∞ because
~x =n∑
i=1
θi sign (~x [i ]) ~ei + θ0~0
![Page 74: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/74.jpg)
B`1 ⊆ C (B`0 ∩ B`∞)
Let ~x ∈ B`1
Set θi := |~x [i ]|, θ0 = 1−∑n
i=1 θi∑ni=0 θi = 1 by construction, θi ≥ 0 and
θ0 = 1−n+1∑i=1
θi
= 1− ||~x ||1≥ 0 because ~x ∈ B`1
~x ∈ B`0 ∩ B`∞ because
~x =n∑
i=1
θi sign (~x [i ]) ~ei + θ0~0
![Page 75: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/75.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1 ≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 76: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/76.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1
≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 77: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/77.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1 ≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 78: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/78.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1 ≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 79: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/79.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1 ≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 80: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/80.jpg)
C (B`0 ∩ B`∞) ⊆ B`1
Let ~x ∈ C (B`0 ∩ B`∞), then
~x =m∑i=1
θi~yi
||~x ||1 ≤m∑i=1
θi ||~yi ||1 by the Triangle inequality
≤m∑i=1
θi ||~yi ||∞ ~yi only has one nonzero entry
≤m∑i=1
θi
≤ 1
![Page 81: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/81.jpg)
Convex optimization problem
f0, f1, . . . , fm, h1, . . . , hp : Rn → R
minimize f0 (~x)
subject to fi (~x) ≤ 0, 1 ≤ i ≤ m,
hi (~x) = 0, 1 ≤ i ≤ p,
![Page 82: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/82.jpg)
Definitions
I A feasible vector is a vector that satisfies all the constraints
I A solution is any vector ~x ∗ such that for all feasible vectors ~x
f0 (~x) ≥ f0 (~x∗)
I If a solution exists f (~x ∗) is the optimal value or optimum of theproblem
![Page 83: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/83.jpg)
Convex optimization problem
The optimization problem is convex if
I f0 is convex
I f1, . . . , fm are convex
I h1, . . . , hp are affine, i.e. hi (~x) = ~aTi ~x + bi for some ~ai ∈ Rn and
bi ∈ R
![Page 84: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/84.jpg)
Linear program
minimize ~aT~x
subject to ~c Ti ~x ≤ di , 1 ≤ i ≤ m
A~x = ~b
![Page 85: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/85.jpg)
`1-norm minimization as an LP
The optimization problem
minimize ||~x ||1subject to A~x = ~b
can be recast as the LP
minimizem∑i=1
~t[i ]
subject to ~t[i ] ≥ ~eiT~x
~t[i ] ≥ −~ei T~x
A~x = ~b
![Page 86: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/86.jpg)
Proof
Solution to `1-norm min. problem: ~x `1
Solution to linear program:(~x lp, ~t lp)
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(
~x `1 , ~t `1)is feasible for linear program
∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=
m∑i=1
~t `1 [i ]
≥m∑i=1
~t lp[i ] by optimality of ~t lp
≥∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
~x lp is a solution to the `1-norm min. problem
![Page 87: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/87.jpg)
Proof
Solution to `1-norm min. problem: ~x `1
Solution to linear program:(~x lp, ~t lp)
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(
~x `1 , ~t `1)is feasible for linear program
∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=
m∑i=1
~t `1 [i ]
≥m∑i=1
~t lp[i ] by optimality of ~t lp
≥∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
~x lp is a solution to the `1-norm min. problem
![Page 88: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/88.jpg)
Proof
Solution to `1-norm min. problem: ~x `1
Solution to linear program:(~x lp, ~t lp)
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(
~x `1 , ~t `1)is feasible for linear program
∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=
m∑i=1
~t `1 [i ]
≥m∑i=1
~t lp[i ] by optimality of ~t lp
≥∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
~x lp is a solution to the `1-norm min. problem
![Page 89: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/89.jpg)
Proof
Solution to `1-norm min. problem: ~x `1
Solution to linear program:(~x lp, ~t lp)
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣(
~x `1 , ~t `1)is feasible for linear program
∣∣∣∣∣∣~x `1∣∣∣∣∣∣1=
m∑i=1
~t `1 [i ]
≥m∑i=1
~t lp[i ] by optimality of ~t lp
≥∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
~x lp is a solution to the `1-norm min. problem
![Page 90: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/90.jpg)
Proof
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣
m∑i=1
t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣
1
≤∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
by optimality of ~x `1
≤m∑i=1
~t lp[i ]
(~x `1 , ~t `1
)is a solution to the linear problem
![Page 91: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/91.jpg)
Proof
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣
m∑i=1
t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣
1
≤∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
by optimality of ~x `1
≤m∑i=1
~t lp[i ]
(~x `1 , ~t `1
)is a solution to the linear problem
![Page 92: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/92.jpg)
Proof
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣
m∑i=1
t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣
1
≤∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
by optimality of ~x `1
≤m∑i=1
~t lp[i ]
(~x `1 , ~t `1
)is a solution to the linear problem
![Page 93: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/93.jpg)
Proof
Set ~t `1 [i ] :=∣∣~x `1 [i ]∣∣
m∑i=1
t`1i =∣∣∣∣∣∣~x `1∣∣∣∣∣∣
1
≤∣∣∣∣∣∣~x lp
∣∣∣∣∣∣1
by optimality of ~x `1
≤m∑i=1
~t lp[i ]
(~x `1 , ~t `1
)is a solution to the linear problem
![Page 94: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/94.jpg)
Quadratic program
For a positive semidefinite matrix Q ∈ Rn×n
minimize ~xTQ~x + ~aT~x
subject to ~c Ti ~x ≤ di , 1 ≤ i ≤ m,
A~x = ~b
![Page 95: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/95.jpg)
`1-norm regularized least squares as a QP
The optimization problem
minimize ||A~x − y ||22 + ~α ||~x ||1
can be recast as the QP
minimize ~xTATA~x − 2~yT~x + ~αn∑
i=1
~t[i ]
subject to ~t[i ] ≥ ~eiT~x
~t[i ] ≥ −~ei T~x
![Page 96: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/96.jpg)
Lagrangian
The Lagrangian of a canonical optimization problem is
L (~x , ~α, ~ν) := f0 (~x) +m∑i=1
~α[i ] fi (~x) +
p∑j=1
~ν[j ] hj (~x) ,
~α ∈ Rm, ~ν ∈ Rp are called Lagrange multipliers or dual variables
If ~x is feasible and ~α[i ] ≥ 0 for 1 ≤ i ≤ m
L (~x , ~α, ~ν) ≤ f0 (~x)
![Page 97: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/97.jpg)
Lagrange dual function
The Lagrange dual function of the problem is
l (~α, ~ν) := inf~x∈Rn
f0 (~x) +m∑i=1
~α[i ]fi (~x) +
p∑j=1
~ν[j ]hj (~x)
Let p∗ be an optimum of the optimization problem
l (~α, ~ν) ≤ p∗
as long as ~α[i ] ≥ 0 for 1 ≤ i ≤ n
![Page 98: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/98.jpg)
Dual problem
The dual problem of the (primal) optimization problem is
maximize l (~α, ~ν)
subject to ~α[i ] ≥ 0, 1 ≤ i ≤ m.
The dual problem is always convex, even if the primal isn’t!
![Page 99: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/99.jpg)
Maximum/supremum of convex functions
Pointwise maximum of m convex functions f1, . . . , fm
fmax (x) := max1≤i≤m
fi (x)
is convex
Pointwise supremum of a family of convex functions indexed by a set I
fsup (x) := supi∈I
fi (x)
is convex
![Page 100: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/100.jpg)
Proof
For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,
fsup (θ~x + (1− θ) ~y) = supi∈I
fi (θ~x + (1− θ) ~y)
≤ supi∈I
θfi (~x) + (1− θ) fi (~y) by convexity of the fi
≤ θ supi∈I
fi (~x) + (1− θ) supj∈I
fj (~y)
= θfsup (~x) + (1− θ) fsup (~y)
![Page 101: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/101.jpg)
Proof
For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,
fsup (θ~x + (1− θ) ~y) = supi∈I
fi (θ~x + (1− θ) ~y)
≤ supi∈I
θfi (~x) + (1− θ) fi (~y) by convexity of the fi
≤ θ supi∈I
fi (~x) + (1− θ) supj∈I
fj (~y)
= θfsup (~x) + (1− θ) fsup (~y)
![Page 102: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/102.jpg)
Proof
For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,
fsup (θ~x + (1− θ) ~y) = supi∈I
fi (θ~x + (1− θ) ~y)
≤ supi∈I
θfi (~x) + (1− θ) fi (~y) by convexity of the fi
≤ θ supi∈I
fi (~x) + (1− θ) supj∈I
fj (~y)
= θfsup (~x) + (1− θ) fsup (~y)
![Page 103: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/103.jpg)
Proof
For any 0 ≤ θ ≤ 1 and any ~x , ~y ∈ R,
fsup (θ~x + (1− θ) ~y) = supi∈I
fi (θ~x + (1− θ) ~y)
≤ supi∈I
θfi (~x) + (1− θ) fi (~y) by convexity of the fi
≤ θ supi∈I
fi (~x) + (1− θ) supj∈I
fj (~y)
= θfsup (~x) + (1− θ) fsup (~y)
![Page 104: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/104.jpg)
Weak duality
If p∗ is a primal optimum and d∗ a dual optimum
d∗ ≤ p∗
![Page 105: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/105.jpg)
Strong duality
For convex problems
d∗ = p∗
under very weak conditions
LPs: The primal optimum is finite
General convex programs (Slater’s condition):
There exists a point that is strictly feasible
fi (~x) < 0 1 ≤ i ≤ m
![Page 106: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/106.jpg)
`1-norm minimization
The dual problem of
min~x||~x ||1 subject to A~x = ~y
is
max~ν~yT~ν subject to
∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1
![Page 107: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/107.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 108: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/108.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1?
We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 109: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/109.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 110: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/110.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x
≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 111: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/111.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 112: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/112.jpg)
Proof
Lagrangian L (~x , ~ν) = ||~x ||1 + ~νT (~y − A~x)
Lagrange dual function
l (~α, ~ν) := inf~x∈Rn
||~x ||1 − (AT~ν)T~x + ~νT~y
If AT~ν[i ] > 1? We can set ~x [i ]→∞ and l (~α, ~ν)→ −∞
If∣∣∣∣AT~ν
∣∣∣∣∞ ≤ 1?
(AT~ν)T~x ≤ ||~x ||1∣∣∣∣∣∣AT~ν
∣∣∣∣∣∣∞≤ ||~x ||1
so l (~α, ~ν) = ~νT~y
![Page 113: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/113.jpg)
Strong duality
The solution ~ν ∗ to
max~ν~yT~ν subject to
∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1
satisfies
(AT~ν ∗)[i ]= sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0
for all solutions ~x ∗ to the primal problem
min~x||~x ||1 subject to A~x = ~y
![Page 114: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/114.jpg)
Dual solution
0 10 20 30 40 50 60 70-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
True Signal SignMinimum L1 Dual Solution
![Page 115: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/115.jpg)
Proof
By strong duality
||~x ∗||1 = ~yT~ν ∗
= (A~x ∗)T~ν ∗
= (~x ∗)T (AT~ν ∗)
=m∑i=1
(AT~ν ∗)[i ]~x ∗[i ]
By Hölder’s inequality
||~x ∗||1 ≥m∑i=1
(AT~ν ∗)[i ]~x ∗[i ]
with equality if and only if
(AT~ν ∗)[i ] = sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0
![Page 116: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/116.jpg)
Another algorithm for sparse recovery
Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x
Insight: We have access to inner products of ~x and AT ~w for any ~w
~yT ~w
= (A~x)T ~w
= ~xT (AT ~w)
Idea: Maximize AT ~w , bounding magnitude of entries by 1
Entries where ~x is nonzero should saturate to 1 or -1
![Page 117: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/117.jpg)
Another algorithm for sparse recovery
Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x
Insight: We have access to inner products of ~x and AT ~w for any ~w
~yT ~w = (A~x)T ~w
= ~xT (AT ~w)
Idea: Maximize AT ~w , bounding magnitude of entries by 1
Entries where ~x is nonzero should saturate to 1 or -1
![Page 118: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/118.jpg)
Another algorithm for sparse recovery
Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x
Insight: We have access to inner products of ~x and AT ~w for any ~w
~yT ~w = (A~x)T ~w
= ~xT (AT ~w)
Idea: Maximize AT ~w , bounding magnitude of entries by 1
Entries where ~x is nonzero should saturate to 1 or -1
![Page 119: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/119.jpg)
Another algorithm for sparse recovery
Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x
Insight: We have access to inner products of ~x and AT ~w for any ~w
~yT ~w = (A~x)T ~w
= ~xT (AT ~w)
Idea: Maximize AT ~w , bounding magnitude of entries by 1
Entries where ~x is nonzero should saturate to 1 or -1
![Page 120: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/120.jpg)
Another algorithm for sparse recovery
Aim: Find nonzero locations of a sparse vector ~x from ~y = A~x
Insight: We have access to inner products of ~x and AT ~w for any ~w
~yT ~w = (A~x)T ~w
= ~xT (AT ~w)
Idea: Maximize AT ~w , bounding magnitude of entries by 1
Entries where ~x is nonzero should saturate to 1 or -1
![Page 121: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/121.jpg)
Compressed sensing
Convex constrained problems
Analyzing optimization-based methods
![Page 122: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/122.jpg)
Analyzing optimization-based methods
Best case scenario: Primal solution has closed form
Otherwise: Use dual solution to characterize primal solution
![Page 123: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/123.jpg)
Minimum `2-norm solution
Let A ∈ Rm×n be a full rank matrix such that m < n
For any ~y ∈ Rn the solution to the optimization problem
argmin~x||~x ||2 subject to A~x = ~y .
is
~x ∗ := VS−1UT~y
= AT(ATA
)−1~y
where A = USV T is the SVD of A
![Page 124: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/124.jpg)
Proof
~x = Prow(A) ~x + Prow(A)⊥ ~x
Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn
A~x = AProw(A) ~x
= USV TV ~c
= US~c
A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y
![Page 125: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/125.jpg)
Proof
~x = Prow(A) ~x + Prow(A)⊥ ~x
Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn
A~x = AProw(A) ~x
= USV TV ~c
= US~c
A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y
![Page 126: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/126.jpg)
Proof
~x = Prow(A) ~x + Prow(A)⊥ ~x
Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn
A~x = AProw(A) ~x
= USV TV ~c
= US~c
A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y
![Page 127: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/127.jpg)
Proof
~x = Prow(A) ~x + Prow(A)⊥ ~x
Since A is full rank V , Prow(A) ~x = V ~c for some vector ~c ∈ Rn
A~x = AProw(A) ~x
= USV TV ~c
= US~c
A~x = ~y is equivalent to US~c = ~y and ~c = S−1UT~y
![Page 128: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/128.jpg)
Proof
For all feasible vectors ~x
Prow(A) ~x = VS−1UT~y
By Pythagoras’ theorem, minimizing ||~x ||2 is equivalent to minimizing
||~x ||22 =∣∣∣∣Prow(A) ~x
∣∣∣∣22 +
∣∣∣∣∣∣Prow(A)⊥ ~x∣∣∣∣∣∣2
2
![Page 129: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/129.jpg)
Regular subsampling
![Page 130: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/130.jpg)
Minimum `2-norm solution (regular subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L2 Solution
![Page 131: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/131.jpg)
Regular subsampling
A :=1√2
[Fm/2 Fm/2
]F ∗m/2Fm/2 = I
Fm/2F∗m/2 = I
~x :=
[~xup~xdown
]
![Page 132: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/132.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 133: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/133.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 134: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/134.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 135: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/135.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)
=12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 136: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/136.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)
=12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 137: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/137.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 138: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/138.jpg)
Regular subsampling
~x`2 = arg minA~x=~y
||~x ||2
= AT(ATA
)−1~y
=1√2
[F ∗m/2F ∗m/2
](1√2
[Fm/2 Fm/2
] 1√2
[F ∗m/2F ∗m/2
])−11√2
[Fm/2 Fm/2
] [ ~xup~xdown
]
=12
[F ∗m/2F ∗m/2
](12
[Fm/2F
∗m/2 + Fm/2F
∗m/2
])−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2F ∗m/2
]I−1 (Fm/2~xup + Fm/2~xdown
)=
12
[F ∗m/2
(Fm/2~xup + Fm/2~xdown
)F ∗m/2
(Fm/2~xup + Fm/2~xdown
)]
=12
[~xup + ~xdown~xup + ~xdown
]
![Page 139: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/139.jpg)
Minimum `1-norm solution
Problem: argminA~x=~y ||~x ||1 doesn’t have a closed form
Instead we can use a dual variable to certify optimality
![Page 140: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/140.jpg)
Dual solution
The solution ~ν ∗ to
max~ν~yT~ν subject to
∣∣∣∣∣∣AT~ν∣∣∣∣∣∣∞≤ 1
satisfies
(AT~ν ∗)[i ]= sign(~x ∗[i ]) for all ~x ∗[i ] 6= 0
where ~x ∗[i ] is a solution to the primal problem
min~x||~x ||1 subject to A~x = ~y
![Page 141: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/141.jpg)
Dual certificate
If there exists a vector ~ν ∈ Rn such that
(AT~ν)[i ] = sign(~x ∗[i ]) if ~x ∗[i ] 6= 0∣∣∣(AT~ν)[i ]∣∣∣<1 if ~x ∗[i ] = 0
then ~x ∗ is the unique solution to the primal problem
min~x||~x ||1 subject to A~x = ~y
as long as the submatrix AT is full rank
![Page 142: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/142.jpg)
Proof 1
~ν is feasible for the dual problem, so for any primal feasible ~x
||~x ||1 ≥ ~yT~ν
= (A~x ∗)T~ν
= (~x ∗)T (AT~ν)
=∑i∈T
~x ∗[i ] sign(~x ∗[i ])
= ||~x ∗||1
~x ∗ must be a solution
![Page 143: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/143.jpg)
Proof 1
~ν is feasible for the dual problem, so for any primal feasible ~x
||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν
= (~x ∗)T (AT~ν)
=∑i∈T
~x ∗[i ] sign(~x ∗[i ])
= ||~x ∗||1
~x ∗ must be a solution
![Page 144: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/144.jpg)
Proof 1
~ν is feasible for the dual problem, so for any primal feasible ~x
||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν
= (~x ∗)T (AT~ν)
=∑i∈T
~x ∗[i ] sign(~x ∗[i ])
= ||~x ∗||1
~x ∗ must be a solution
![Page 145: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/145.jpg)
Proof 1
~ν is feasible for the dual problem, so for any primal feasible ~x
||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν
= (~x ∗)T (AT~ν)
=∑i∈T
~x ∗[i ] sign(~x ∗[i ])
= ||~x ∗||1
~x ∗ must be a solution
![Page 146: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/146.jpg)
Proof 1
~ν is feasible for the dual problem, so for any primal feasible ~x
||~x ||1 ≥ ~yT~ν = (A~x ∗)T~ν
= (~x ∗)T (AT~ν)
=∑i∈T
~x ∗[i ] sign(~x ∗[i ])
= ||~x ∗||1
~x ∗ must be a solution
![Page 147: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/147.jpg)
Proof 2
AT~ν is a subgradient of the `1 norm at ~x ∗
For any other feasible vector ~x
||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)
= ||~x ∗||1 + ~νT (A~x − A~x ∗)
= ||~x ∗||1
![Page 148: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/148.jpg)
Proof 2
AT~ν is a subgradient of the `1 norm at ~x ∗
For any other feasible vector ~x
||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)
= ||~x ∗||1 + ~νT (A~x − A~x ∗)
= ||~x ∗||1
![Page 149: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/149.jpg)
Proof 2
AT~ν is a subgradient of the `1 norm at ~x ∗
For any other feasible vector ~x
||~x ||1 ≥ ||~x∗||1 + (AT~ν)T (~x − ~x ∗)
= ||~x ∗||1 + ~νT (A~x − A~x ∗)
= ||~x ∗||1
![Page 150: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/150.jpg)
Random subsampling
![Page 151: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/151.jpg)
Minimum `1-norm solution (random subsampling)
0 10 20 30 40 50 60 70-3
-2
-1
0
1
2
3
4
True SignalMinimum L1 Solution
![Page 152: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/152.jpg)
Exact sparse recovery via `1-norm minimization
Assumption: There exists a signal ~x ∗ ∈ Rm with s nonzeros such that
A~x ∗ = ~y
for a random A ∈ Rm×n (random Fourier, Gaussian iid, . . . )
Exact recovery: If the number of measurements satisfies
m ≥ C ′s log n
the solution of the problem
minimize ||~x ||1 subject to A ~x = y
is the original signal with probability at least 1− 1n
![Page 153: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/153.jpg)
Proof
Show that dual certificate always exists
We need
ATT~ν = sign(~x ∗T ) s constraints∣∣∣∣∣∣AT
T c~ν∣∣∣∣∣∣∞< 1
Idea: Impose AT~ν = sign(~x ∗) and minimize∣∣∣∣AT
T c~ν∣∣∣∣∞
Problem: No closed-form solution
How about minimizing `2 norm?
![Page 154: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/154.jpg)
Proof of exact recovery
Prove that dual certificate exists for any s-sparse ~x ∗
Dual certificate candidate: Solution of
minimize ||~v ||2subject to AT
T~v = sign (~x ∗T )
Closed-form solution ~ν`2 := AT
(ATTAT
)−1 sign (~x ∗T )
ATTAT is invertible with high probability
We need to prove that AT~ν`2 satisfies∣∣∣∣∣∣(AT~ν`2)T c
∣∣∣∣∣∣∞< 1
![Page 155: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/155.jpg)
Dual certificate
0 10 20 30 40 50 60 70-1.5
-1
-0.5
0
0.5
1
1.5
Sign PatternDual Function
![Page 156: Constrained Optimization - NYU Courantcfgranda/pages/OBDA_fall17/slides... · 2017. 12. 11. · jjA~xjj 2 p 1+ s jj~xjj 2 holdsforanys-sparsevector~xwithprobabilityatleast 1 2 en](https://reader036.vdocument.in/reader036/viewer/2022071609/6147b09eafbe1968d37a36a7/html5/thumbnails/156.jpg)
Proof of exact recovery
To control (AT~ν`2)T c , we need to bound
ATi
(ATTAT
)−1sign (~x ∗T )
for i ∈ T c
Let ~w :=(ATTAT
)−1 sign (~x ∗T )
|ATi ~w| can be bounded using independence
Result then follows from union bound