simultaneous blind deconvolution and blind demixing via ...sling/slides/2016asilomar.pdfplot: k = n...

27
Simultaneous Blind Deconvolution and Blind Demixing via Convex Programming Shuyang Ling Department of Mathematics, UC Davis Nov.8th, 2016 Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 1 / 25

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Simultaneous Blind Deconvolution and Blind Demixingvia Convex Programming

Shuyang Ling

Department of Mathematics, UC Davis

Nov.8th, 2016

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 1 / 25

Page 2: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Acknowledgements

Research in collaboration with:

Prof.Thomas Strohmer (UC Davis)

This work is sponsored by NSF-DMS and DARPA.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 2 / 25

Page 3: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Outline

Setup: blind deconvolution and demixing

Convex relaxation and main result

Numerics and idea of proof

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 3 / 25

Page 4: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

What is blind deconvolution?

What is blind deconvolution?

Suppose we observe a function y which is the convolution of two unknownfunctions, the blurring function f and the signal of interest g , plus noisew . How to reconstruct f and g from y?

y = f ∗ g + w .

It is obviously a highly ill-posed bilinear inverse problem... but importantin signal processing.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 4 / 25

Page 5: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Blind deconvolution in wireless communication

Joint channel and signal estimation in wireless communication

Suppose that a signal x , encoded by A, is transmitted through anunknown channel f . How to reconstruct f and x from y?

y = f ∗ Ax + w .

=

f:unknown channel

A:Encoding matrix

x:unknown signal

y:received signal

  +

w:noise

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 5 / 25

Page 6: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Blind deconvolution meets demixing?

User1

User𝑖

User𝑟

𝑔$: signal

⋮𝑦 = ∑ 𝑓1 ∗ 𝑔1 + 𝑤5

16$𝑔1: signal

𝑔5: signal

𝑓1: channel

𝑓$: channel

𝑓5: channel

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒(𝑓$, 𝑔$)

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒(𝑓1, 𝑔1)

𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒(𝑓5, 𝑔5)Decoder

𝑓1 ∗ 𝑔1

𝑓$ ∗ 𝑔$

𝑓5 ∗ 𝑔5

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 6 / 25

Page 7: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Blind deconvolution and blind demixing

We start from the original model

y =r∑

i=1

f i ∗ g i + w .

This is even more difficult than blind deconvolution, since this is a“mixture” of blind deconvolution problem

More assumptions

Each impulse responses f i has maximum delay spread K .

f i (n) = 0, for n > K .

g i := Aix i is the signal x i ∈ CN encoded by matrix Ai ∈ CL×N withL > N.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 7 / 25

Page 8: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Model under subspace assumption

In the frequency domain,

y =r∑

i=1

f i � g i + w =r∑

i=1

diag(f i )g i + w ,

where “� ” denotes entry-wise multiplication. We assume y and y areboth of length L.

Subspace assumption

Denote F as the L× L DFT matrix.

Let hi ∈ CK be the first K nonzero entries of f i and B i be alow-frequency DFT matrix. There holds,

f i = Ff i = B ihi .

g i := Aix i where Ai := FAi and x i ∈ CN .

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 8 / 25

Page 9: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Mathematical model

Finally, we end up with the following model,

Model with subspace constraint

y =r∑

i=1

diag(B ihi )Aix i + w ,

Goal: We want to recover (hi , x i )ri=1 from (y ,B i ,Ai )

ri=1.

Remark: The degree of freedom for unknowns: r(K + N); number ofconstraint: L. To make the solution identifiable, we require L ≥ r(K + N)at least.

Special case if r = 1

In particular, if r = 1, it is a blind deconvolution problem.

y = diag(Bh)Ax + w .

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 9 / 25

Page 10: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Nonconvex optimization?

Naive approach?

We may want to try nonlinear least squares approach:

min(u i ,v i )

∥∥∥∥∥r∑

i=1

diag(B iu i )Aiv i − y

∥∥∥∥∥2

.

This gives a nonconvex objective function.

May get stuck at local minima and no guarantees for recoverability.

For r = 1, we have recovery guarantees by adding regularizers but notfor r > 1.

Two-step convex approach

(a) Lifting: convert nonconvex constraints to linear

(b) Solving a SDP relaxation and hope to recover {hix∗i }ri=1

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 10 / 25

Page 11: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Nonconvex optimization?

Naive approach?

We may want to try nonlinear least squares approach:

min(u i ,v i )

∥∥∥∥∥r∑

i=1

diag(B iu i )Aiv i − y

∥∥∥∥∥2

.

This gives a nonconvex objective function.

May get stuck at local minima and no guarantees for recoverability.

For r = 1, we have recovery guarantees by adding regularizers but notfor r > 1.

Two-step convex approach

(a) Lifting: convert nonconvex constraints to linear

(b) Solving a SDP relaxation and hope to recover {hix∗i }ri=1

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 10 / 25

Page 12: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Convex approach of demixing problem

Step 1: lifting

Let ai ,l be the l-th column of A∗i and bi ,l be the l-th column of B∗i .

yl =r∑

i=1

(B ihi )lx∗i ai ,l + wl =r∑

i=1

b∗i ,lhix∗i ai ,l + wl ,

Let X i := hix∗i and define the linear operator Ai : CK×N → CL as,

Ai (Z ) := {b∗i ,lZai ,l}Ll=1 = {⟨Z ,bi ,la∗i ,l

⟩}Ll=1.

Then, there holds

y =r∑

i=1

Ai (X i ) + w .

Advantage: linear constraints (convex constraints)

Disadvantage: dimension increases

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 11 / 25

Page 13: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Rank-r matrix recovery

Recast as rank-r matrix recovery

We rewrite y =∑r

i=1 diag(B ihi )Aix i as

yl =

⟨h1x∗1 0 · · · 00 h2x∗2 · · · 0...

.... . .

...0 0 · · · hrx∗r

,b1,la∗1,l 0 · · · 0

0 b2,la∗2,l · · · 0...

.... . .

...0 0 · · · br ,la∗r ,l

Find a rank-r block diagonal matrix satisfying the linear constraintsabove.

Finding such a rank-r matrix is also an NP-hard problem.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 12 / 25

Page 14: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Convex relaxation

Nuclear norm minimization

Since this system is highly underdetermined, we hope to recover all{Z i}ri=1 from

minr∑

i=1

‖Z i‖∗ subject tor∑

i=1

Ai (Z i ) = y .

Once we obtain {Z i}ri=1, we can easily extract the leading left and rightsingular vectors from Z i as the estimation of (hi , x i ).

Key question: does the solution to the SDP above really give {hix∗i }ri=1?What conditions are needed?

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 13 / 25

Page 15: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

State-of-the-art: blind deconvolution (r = 1)

Nuclear norm minimization

Consider the convex envelop of rank(Z ): nuclear norm ‖Z‖∗ =∑σi (Z ).

min ‖Z‖∗ s.t. A(Z ) = A(X ).

Convex optimization can be solved within polynomial time.

Theorem [Ahmed-Recht-Romberg 14]

Assume y = diag(Bh)Ax , A : L× N is a Gaussian random matrix, andB ∈ CL×K is a partial DFT matrix,

B∗B = IK , L‖Bh‖2∞ ≤ µ2h,

the above convex relaxation recovers X = hx∗ exactly with highprobability if

C0 max(K , µ2hN) ≤ L

log3 L.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 14 / 25

Page 16: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Main results

Theorem [Ling-Strohmer 15]

Each B i ∈ CL×K partial DFT matrix with B∗i B i = IK and each Ai is a

Gaussian random matrix, i.e., each entry in Aii.i.d∼ N (0, 1). Let µ2h be as

defined in µ2h = Lmax1≤i≤r‖B ihi‖2∞‖hi‖2

. If

L ≥ Cα+log r r2 max{K , µ2hN} log3 L,

then the solution to convex relaxation satisfies

X i = X i , for all i = 1, . . . , r ,

with probability at least 1−O(L−α+1).

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 15 / 25

Page 17: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Remark

Our result is a generalization of Ahmed-Romberg-Recht’s result tor > 1.

B i have other choices other than DFT matrix.

Incoherence µ2h does affect the result.

r2 is not optimal. One group has claimed to reduce the samplingcomplexity from r2 to r .

Empirically, Ai can be other matrices besides Gaussian. However, notheories exist so far.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 16 / 25

Page 18: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Remark

Our result is a generalization of Ahmed-Romberg-Recht’s result tor > 1.

B i have other choices other than DFT matrix.

Incoherence µ2h does affect the result.

r2 is not optimal. One group has claimed to reduce the samplingcomplexity from r2 to r .

Empirically, Ai can be other matrices besides Gaussian. However, notheories exist so far.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 16 / 25

Page 19: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Numerics: does L really scales linearly with r?

Ai is chosen as Gaussian matrix. Here K = 30 and N = 25 are fixed.Black: failure; White: success.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 17 / 25

Page 20: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Numerics: does L really scales linearly with r?

Choose Ai = D iH where H is a partial Hadamard matrix (±1 orthogonalmatrix). D i is a random ±1 diagonal matrix.

0 2 4 6 8 10 12 14 16 180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r = 1,2,...,18

Em

piric

al p

rob

ab

ility

of

su

cce

ss

Plot: K = N = 15; A:Hadamard matrix

L = 64

L = 128

L = 256

L = 512

Number of unknowns r(K + N) = 16 · 30 = 480 is slightly smaller than thenumber of constraints 512.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 18 / 25

Page 21: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Numerics: L scales linearly with K + N .

Numerical simulations verify our theory that L ≈ O(r(K + N)) gives exactrecovery. Here r = 2 and L ≈ 1.5r(K + N).

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 19 / 25

Page 22: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Numerics: L vs. µ2h

We observe strong linear correlation between minimal required L and µ2h:

L ∝ max1≤i≤r

‖B ihi‖2∞‖hi‖2

.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 20 / 25

Page 23: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Stability theorem

In reality measurements are noisy. Hence, suppose that y = y + w wherew is noise with ‖w‖ ≤ η.

minr∑

i=1

‖Z i‖∗ subject to ‖r∑

i=1

Ai (Z i )− y‖ ≤ η. (1)

Theorem

Assume we observe y = y + w =∑r

i=1Ai (X i ) + w with ‖w‖ ≤ η. Then,the minimizer {X i}ri=1 satisfies√√√√ r∑

i=1

‖X i − X i‖2F ≤ Cr√

max{K ,N}η.

with probability at least 1−O(L−α+1).

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 21 / 25

Page 24: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Stability

We see that the relative error is linearly correlated with the noise in dB.Approximately, 10 units of increase in SNR leads to the same amount ofdecrease in relative error (in dB).

−10 0 10 20 30 40 50 60 70 80 90−120

−100

−80

−60

−40

−20

0

L = 256, r = 15, A: Partial Hadamard matrix

SNR(dB)

Avera

ge R

ela

tive E

rror

of 10 S

am

ple

s(d

B)

−10 0 10 20 30 40 50 60 70 80 90−120

−100

−80

−60

−40

−20

0

L = 256, r = 3, A: Gaussian

SNR(dB)

Avera

ge R

ela

tive E

rror

of 10 S

am

ple

s(d

B)

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 22 / 25

Page 25: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Sketch of proof

Let’s consider the noiseless version,

minr∑

i=1

‖Z i‖∗, subject tor∑

i=1

Ai (Z i ) = y .

Difficulties:

X i is asymmetric.

How to deal with block diagonal structure?

Two-step proof

Find a sufficient condition for exact recovery

Construct an approximate dual certificate via golfing scheme

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 23 / 25

Page 26: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Sufficient condition

Three key ingredients to achieve exact recovery

1 Local isometry property on Ti

max1≤i≤r

‖PTiA∗i AiPTi

− PTi‖ ≤ 1

4

where Ti = {hih∗i Z + (I − hih∗i )Zx ix∗i }.2 Local incoherence property

maxi 6=j‖PTi

A∗i AjPTj‖ ≤ 1

4r

3 Existence of an approximate dual certificate, which is achieved via thecelebrated golfing scheme). Find a λ ∈ CL such that for all 1 ≤ i ≤ r ,

‖PTi(A∗i λ)− hix∗i ‖F ≤

1

5rγ, ‖PT⊥

i(A∗i λ)‖ ≤ 1

2

where γ := max{‖Ai‖}.Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 24 / 25

Page 27: Simultaneous Blind Deconvolution and Blind Demixing via ...sling/Slides/2016Asilomar.pdfPlot: K = N = 15; A:Hadamard matrix L = 64 L = 128 L = 256 L = 512 Number of unknowns r(K +

Conclusion and future work

Can we derive a theoretical bound that scales linearly in r , rather thanquadratic in r as our current theory? (It may have been solved!)

Is it possible to develop satisfactory theoretical bounds fordeterministic matrices Ai?

Fast algorithms: extend our nonconvex optimization framework tothis blind-deconvolution-blind-demixing scenario.

Can we develop a theoretical framework where the signals x i belongto some non-linear subspace, e.g. for sparse x i?

See details in our paper: Blind Deconvolution Meets BlindDemixing: Algorithms and Performance Bounds. arXiv:1512.07730.

Shuyang Ling (UC Davis) Asilomar Conference, 2016 Nov.8th, 2016 25 / 25