random matrix theory in sparse recovery - tu berlin

Random matrix theory in sparse recovery

Maryia Kabanava

RWTH Aachen University

CoSIP Winter Retreat 2016

Maryia Kabanava (RWTH Aachen) Random matrix theory in sparse recovery CoSIP 2016

Compressed sensing

Goal: reconstruction of (high-dimensional) signals from minimalamount of measured data

Key ingredients:

Exploit low complexity of signals (e.g. sparsity/compressibility)

Efficient algorithms (e.g. convex optimization)

Randomness (random matrices)


Signal recovery problem

Signal x ∈ Rd is unknown.

Given:

Signal linear measurement map: M : Rd → Rm, m ≪ d .

Measurement vector: y = Mx + w ∈ Rm, ‖w‖2 ≤ η.

Goal: recover x from y .Idea: recovery is possible if x belongs to a set of low complexity.

Standard compressed sensing: sparsity (small number ofnonzero coefficients)

Cosparsity: sparsity after transformation

Structured sparsity: e.g. block sparsity

Low rank matrix recovery

Low rank tensor recovery


Noiseless model

M x

S

S c

y

m m × d=

under-determined linear system

supp x = S ⊂ {1, 2, . . . , d}ℓ0-minimization

minz∈Rd

‖z‖0 s.t. Mz = y

NP-hard

ℓ1-minimization

minz∈Rd

‖z‖1 s.t. Mz = y

efficient minim. methods


Nonuniform vs. uniform recovery

Nonuniform recoveryA fixed sparse (compressible) vector is recovered with highprobability using M.Sufficient conditions on M

Descent cone of ℓ1-norm at x intersects kerM trivially.Construct (approximate) dual certificate.

Uniform recoveryWith high probability on M every sparse (compressible)vector is recovered.Sufficient conditions on M

Null space property.Restricted isometry property.


Nonuniform recovery: descent cone

For fixed x ∈ Rd , we define the convex cone

T (x) = cone{z − x : z ∈ Rd , ‖z‖1 ≤ ‖x‖1}.

Theorem

Let M ∈ Rm×d . A vector x ∈ R

d isthe unique minimizer of ‖z‖1 subjectto Mz = Mx if and only ifkerM ∩ T (x) = {0}.

x + kerM

x

x + T (x)

Let Sd−1 = {x ∈ Rd : ‖x‖2 = 1} and set T := T (x) ∩ S

d−1. If

infx∈T

‖Mx‖2 > 0, (1)

then kerM ∩ T = ∅ and kerM ∩ T (x) = {0}.Maryia Kabanava (RWTH Aachen) Random matrix theory in sparse recovery CoSIP 2016

Uniform recovery: null space property (NSP)

M ∈ Rm×d is said to satisfy the stable NSP of order s with

0 < ρ < 1, if for any S ⊂ [d ] with |S | ≤ s it holds

‖vS‖1 < ρ‖vSc‖1 for all v ∈ kerM. (2)

Theorem

Let M ∈ Rm×d satisfy (2). Then, for any x ∈ R

d the solution x̂ of

minz∈Rd

‖z‖1 subject to Mz = y ,

with y = Mx, approximates x with ℓ1-error

‖x − x̂‖1 ≤2(1 + ρ)

1− ρσs(x)1, (3)

where σs(x)1 := inf {‖x − z‖1 : z is s-sparse}.


Strategy to check NSP

Lemma

Let

Tρ,s:={

w ∈ Rd : ‖wS‖1≥ ρ‖wSc‖1 for some S ⊂ [d ], |S |≤ s

}

.

Set T := Tρ,k ∩ Sd−1. If

infw∈T

‖Mw‖2 > 0,

then for any v ∈ kerM it holds

‖vS‖1 < ρ‖vSc‖1.


Uniform recovery: restricted isometry property (RIP)

Definition

The restricted isometry constant δs of a matrix M ∈ Rm×d is

defined as the smallest δs such that

(1− δs)‖x‖22 ≤ ‖Mx‖22 ≤ (1 + δs)‖x‖22 (4)

for all s-sparse x ∈ Rd .

Requires that all s-column submatrices of M arewell-conditioned.

δs = max|S|≤s

‖MTS MS − Id ‖2→2

Implies stable NSP.

We say that M satisfies the restricted isometry property if δs issmall for reasonably large s.


RIP implies recovery by ℓ1-minimization

(1− δs)‖x‖22 ≤ ‖Mx‖22 ≤ (1 + δs)‖x‖22 (5)

Theorem

Assume that the restricted isometry constant of M ∈ Rm×d

satisfiesδ2s < 1/

√2 ≈ 0.7071.

Then ℓ1-minimization reconstructs every s-sparse vector x ∈ Rd

from y = Mx.


Matrices satisfying recovery conditions

Open problem: Give explicit matrices M ∈ Rm×d that satisfy

recovery conditions.Goal: Successful recovery with M ∈ R

m×d , if

m ≥ Cs lnα(d),

for constants C and α.

Deterministic matrices known, for which m ≥ Cs2.

Way out: consider random matrices.


Gaussian random variables

A standard Gaussian random variabel X ∼ N(0, 1) has probabilitydensity function

ψ(x) =1√2π

e−x2/2. (6)

1 The tail of X decays super-exponentially

P(|X | > t) ≤ e−t2/2, t > 0. (7)

2 The absolute moments of X can be computed as

(E |X |p)1/p =√2

(

Γ((1 + p)/2)

Γ(1/2)

)1/p

= O(√p), p ≥ 1.

3 The moment generating function of X equals

E exp(tX ) = et2/2, t ∈ R.


Subgaussian random variables

Lemma

Let X be a random variable with EX = 0. Then the followingproperties are equivalent.

1 Tails: There exist β, κ > 0 such that

P(|X | > t) ≤ βe−κt2 for all t > 0. (8)

2 Moments:

(E |X |p)1/p ≤ C√p for all p ≥ 1. (9)

3 Moment generating function:

E exp(tX ) ≤ ect2

for all t ∈ R. (10)

A random variable X with EX = 0 that satisfies one of theproperties above is called subgaussian.


Subgaussian random variables: examples

1 Gaussian

2 Bernoulli: P {X = −1} = P {X = 1} =1

23 Bounded: |X | ≤ M almost surely for some M


Hoeffding-type inequality

Theorem

Let X1, . . . ,XN be a sequence of independent subgaussian randomvariables,

E exp(tXi) ≤ ect2

for all t ∈ R and i ∈ {1, . . . ,N}. (11)

For a ∈ RN , the random variable Z :=

N∑

i=1

aiXi is subgaussian, i.e.

E exp(tZ ) ≤ exp(

c‖a‖22t2)

for all t ∈ R (12)

and

P

(∣

∣

∣

∣

∣

N∑

i=1

aiXi

∣

∣

∣

∣

∣

≥ t

)

≤ 2 exp

(

− t2

4c‖a‖22

)



Subexponential random variables

A random variable X with EX = 0 is called subexponential if thereexist β, κ > 0 such that

P(|X | > t) ≤ βe−κt for all t > 0. (14)

Theorem (Bernstein-type inequality)

Let X1, . . . ,XN be a sequence of independent subexponentialrandom variables,

P(|Xi | > t) ≤ βe−κt for all t > 0 and i ∈ {1, . . . ,N}. (15)

Then

P

(∣

∣

∣

∣

∣

N∑

i=1

Xi

∣

∣

∣

∣

∣

≥ t

)

≤ 2 exp

(

− (κt)2

2βN + κt

)



Random matrices

Definition

Let M ∈ Rm×d be a random matrix.

If the entries of M are independent Bernoulli variables (i.e.taking values ±1 with equal probability), then M is called aBernoulli random matrix.

If the entries of M are independent standard Gaussian randomvariables, then M is called a Gaussian random matrix.

If the entries of M are independent subgaussian randomvariables,

P (|Mjk | ≥ t) ≤ βe−κt2 for all t > 0,

then M is called a subgaussian random matrix.


RIP for subgaussian random matrices

Theorem

Let M ∈ Rm×d be subgaussian random matrix. Then there exists

C = C (β, κ) > 0 such that the restricted isometry constant of1√mM satisfies δs ≤ δ w.p. at least 1− ε provided

m ≥ Cδ−2(

s ln(ed/s) + ln(2ε−1))

. (17)


Random matrices with subgaussian rows

Let Y ∈ Rd be random.

If E |〈Y , x〉|2 = ‖x‖22 for all x ∈ Rd , then Y is called isotropic.

If, for all x ∈ Rd with ‖x2‖ = 1, the random variable 〈Y , x〉 is

subgaussian,

E exp (t〈Y , x〉) ≤ exp(ct2) for all t ∈ R, (c is indep. of x),

then Y is called a subgaussian random vector.

Theorem

Let M ∈ Rm×d be random with independent, isotropic,

subgaussian rows with the same parameter c. If

m ≥ Cδ−2(

s ln(ed/s) + ln(2ε−1))

, (18)

then the restricted isometry constant of 1√mM satisfies δs ≤ δ w.p.

at least 1− ε.


Ingredients of the proof: concentration inequality

Let M ∈ Rm×d be random with independent, isotropic,

subgaussian rows. Then, for all x ∈ Rd and every t ∈ (0, 1),

P(∣

∣m−1‖Mx‖22 − ‖x‖22∣

∣ ≥ t‖x‖22)

≤ 2 exp(−ct2m). (19)

Proof.

Let x ∈ Rd , ‖x‖2 = 1. Denote the rows of M by Y1, . . . ,Ym ∈ R

d .Define

Zi = |〈Yi , x〉|2 − ‖x‖22, i = 1, . . . ,m.

EZi = 0, P (|Zi | ≥ r) ≤ β exp(−κr)

m−1‖Mx‖22 − ‖x‖22 = m−1m∑

i=1

Zi

Bernstein inequality:

P

(∣

∣

∣

∣

∣

m−1m∑

i=1

Zi

∣

∣

∣

∣

∣

≥ t

)

≤ 2 exp

(

− κ2

4β + 2κmt2

)


Ingredients of the proof: covering argument

Let M ∈ Rm×d be random and

P(∣

∣m−1‖Mx‖22 − ‖x‖22∣

∣ ≥ t‖x‖22)

≤ 2 exp(−ct2m) for all x ∈ Rd .

Define M̃ = 1√mM. Then

P

(∣

∣

∣‖M̃x‖22 − ‖x‖22

∣

∣

∣≥ t‖x‖22

)


For S ⊂ {1, . . . , d}, |S | = s and δ, ε ∈ (0, 1), if

m ≥ Cδ−2(7s + 2 ln(2ε−1)), (20)

then w.p. at least 1− ε

‖M̃TS M̃S − Id ‖2→2 < δ. (21)


Ingredients of the proof: union bound

Let M̃ ∈ Rm×d be random and

P

(∣

∣

∣‖M̃x‖22 − ‖x‖22

∣

∣

∣≥ t‖x‖22

)


If for δ, ε ∈ (0, 1),

m ≥ Cδ−2[

s(9 + 2 ln(d/s)) + 2 ln(2ε−1)]

, (22)

then w.p. at least 1− ε, the restricted isometry constant δs of M̃satisfies δs < δ.


Gaussian width

For T ⊂ Rd we define its Gaussian width by

ℓ(T ) := Esupx∈T

〈x , g〉, g ∈ Rd is Gaussian. (23)

u

T

width Due to the rotation invariance(23) can be written as

ℓ(T ) = E‖g‖2 · Esupx∈T

〈x , u〉,

where u is uniformly distributedon S

d−1.

ℓ(Sd−1) = E sup‖x‖2=1

〈x , g〉 = E‖g‖2 ∼√d

D := conv{

x ∈ Sd−1 : |supp x | ≤ s

}

, ℓ(D) ∼√

s ln(d/s)


Gordon’s escape through a mesh

ℓ(T ) := Esupx∈T

〈x , g〉, g ∈ Rd is Gaussian.

Em := E‖g‖2 =√2 Γ((m+1)/2)

Γ(m/2) , g ∈ Rm is Gaussian,

m√m + 1

≤ Em ≤ √m.

Theorem

Let M ∈ Rm×d be Gaussian and T ⊂ S

d−1. Then, for t > 0, itholds

P

(

infx∈T

‖Mx‖2 > Em − ℓ(T )− t

)

≥ 1− e−t2

2 . (24)

The proof relies on the concentration of measure inequality forLipschitz functions.m is determined by:

Em ≥ m√m + 1

≥ ℓ(T ) + t +1

τ(m & ℓ(T )2)


Estimates for Gaussian widths of T (x)

T (x) = cone{z − x : z ∈ Rd , ‖z‖1 ≤ ‖x‖1} (25)

N (x) := {z ∈ Rd:〈z ,w−x〉 ≤ 0 for all w s.t. ‖w‖1 ≤ ‖x‖1} (26)

ℓ(T (x) ∩ Sd−1) ≤ E min

z∈N (x)‖g − z‖2, g ∈ R

d is a standard

Gaussian random vector.

Let supp(x) = S . Then

N (x) =⋃

t≥0

{

z ∈ Rd : zi = t sgn(xi ), i ∈ S , |zi | ≤ t, i ∈ Sc

}

[

ℓ(

T (x) ∩ Sd−1)]2 ≤ 2s ln(ed/s)


Nonuniform recovery with Gaussian measurements

Theorem

Let x ∈ Rd be an s-sparse vector. Let M ∈ R

m×d be a randomlydrawn Gaussian matrix. If, for some ε ∈ (0, 1),

m2

m + 1≥ 2s

(

√

ln(ed/s) +

√

ln(ε−1)

s

)2

, (27)

then w.p. at least 1− ε the vector x is the unique minimizer of‖z‖1 subject to Mz = Mx.


Estimates for Gaussian widths of Tρ,s

Tρ,s:={

w ∈ Rd :‖wS‖1≥ρ‖wSc‖1 for some S ⊂ [d ], |S |= s

}

(28)

D := conv{

x ∈ Sd−1 : |supp(x)| ≤ s

}

(29)

Tρ,s ∩ Sd−1 ⊂ (1 + ρ−1)D

ℓ(D) ≤√

2s ln(ed/s) +√s

ℓ(Tρ,s ∩ Sd−1) ≤ (1 + ρ−1)(

√

2s ln(ed/s) +√s)


Ununiform recovery with Gaussian measurements

Theorem

Let M ∈ Rm×d be Gaussian, 0 < ρ < 1 and 0 < ε < 1. If

m2

m + 1≥ 2s

(

(1 + ρ−1)2)

(

√

ln(ed/s) +1√2+

√

ln(ε−1)

s ((1 + ρ−1)2)

)2

then w. p. at least 1− ε for every x ∈ Rd a minimizer x̂ of ‖z‖1

subject to Mz = Mx approximates x with ℓ1-error

‖x − x̂‖1 ≤2(1 + ρ)

(1− ρ)σs(x)1.


Thank you for your attention !!!


random matrix theory in sparse recovery - tu berlin

Documents