slra: choice of a norm and computational issues...slra: problem deﬁnition l, k and r given...

SLRA: choice of a norm and computational issues

Anatoly Zhigljavsky

Cardiff University

Collaborators: Nina Golyandina (St.Petersburg), Jonathan Gillard (Cardiff)

Grenoble, June 2, 2015

SLRA: Problem definition

L, K and r given positive integers such that 1 ≤ r < L ≤ K .

I Mr = ML×Kr ⊂ RL×K , set of matrices with rank ≤ r

I H = HL×K ⊂ RL×K , set of matrices of Hankel structure

I A = Mr ∩HAssume we are given a matrix X⋆ ∈ H.Hankel structured low rank approximation (SLRA) problem is:

f (X,X⋆) → minX∈A

Common choice of f : f (X,X⋆) = ||X− X⋆||2F

Main application area: time series and signal processing

Map Y = (y1, y2, . . . , yN)T into an L× K Hankel matrix X:

X = XY =

y1 y2 · · · yKy2 y3 · · · yK+1...

......

...yL yL+1 · · · yN

.

Common parameterization of elements in A

X ∈ A

⇕

Y(θ) = (y1(θ), . . . , yN(θ))T ,N = L+ K − 1

⇕

yn(θ) =

q∑l=1

al exp(dln) sin(2πωln + ϕl).

Known as ‘sums of damped sinusoids’.

Question:How difficult is the parametric optimization problem?

Answer:Very difficult.

Damped sinusoids: Example

The objective function f (ω) =∑N

n=1 (yn − sin(2πωn))2, ω = 0.35;N = 10 and N = 100.

0

2

4

6

8

10

12

14

16

18

0.2 0.4 0.6 0.8 1ω

(a)

0

50

100

150

200

0.2 0.4 0.6 0.8 1ω

(b)

0

20

40

60

80

100

120

0.2 0.25 0.3 0.35 0.4 0.45 0.5ω

(c)

Figure: Function f (ω).

Damped sinusoids: ExampleThe objective function f (ω) =

∑Nn=1 (yn − sin(2πωn))2, ω = 0.35.

For N = 10, the (global) Lipschitz constant of f is approximately327.86. For N = 100, the Lipschitz constant of f is approximately6195.88.

–1

0

1

2

3

4

0.2 0.3 0.4 0.5 0.6 0.7 0.8ω

(a) log |f ′(ω)|

3

4

5

6

7

0.3 0.32 0.34 0.36 0.38 0.4ω

(b) log |f ′′(ω)| in the region ofthe global minimizer ω(0) = 0.3

Figure: First and second derivatives of f (ω).

Damped sinusoids: ExampleN = 10, yn + εn, where {εn, n = 1, . . . ,N} are normally distributednoise terms with mean 0 and variance σ2.

0

10

20

30

40

50

0.2 0.4 0.6 0.8 1ω

(a) ω∗ = 0.3445f (ω∗) = 5.17 (5.43)

0

10

20

30

40

50

0.2 0.4 0.6 0.8 1ω

(b) ω∗ = 0.4710f (ω∗) = 10.00 (10.86)

0

10

20

30

40

50

0.2 0.4 0.6 0.8 1ω

(c) ω∗ = 0.4694f (ω∗) = 13.56 (16.29)

Figure: Function f (ω), σ2 = 0.5, 1.0, 1.5; f(0.35) is in round brackets.

Damped sinusoids: Example (2)The objective function:

f (ω1, ω2) =N∑j=1

(yj − sin(2πω1j)− sin(2πω2j))2 , ω1 = 0.3, ω2 = 0.32 .

0

5

10

15

20

0.2 0.4 0.6 0.8 1ω

(a) Plot off (ω1, 0.32)

(b) Plot of f (ω1, ω2)

0.2

0.25

0.3

0.35

0.4

0.45

0.5

ω_2

0.2 0.25 0.3 0.35 0.4 0.45 0.5

ω_1

(c) Contourplot off (ω1, ω2) with (0.3, 0.32)marked (+)

Figure: Function f (ω1, ω2).

Multi-extremality and existing methods

Local/global:

I Number of local minima: linear function of N

I Effect of noise: move and dampen true ‘global’ minimum

Existing methods of solving SLRA:

I Based on the use of AP ( alternating projections)

I Based on a local approximation

I Can only reduce the rank of the matrix by one

Question:How good is the method of AP?

Answer:Not good.

Projections

Projection of X to H, called πH(X):The closest Hankel matrix (in Frobenius norm) to any given matrixis obtained by using the simple diagonal averaging procedure.Projection of X to Mr , called π(r)(X)Denote σi = σi (X), the singular values of X, be ordered such thatσ1 ≥ σ2 ≥ . . . ≥ σL.Let Σ0 = diag(σ1, σ2, . . . , σL) ,Σ = diag(σ1, σ2, . . . , σr , 0, . . . , 0).The SVD of X can be written as X = UΣ0V

T and the matrixπ(r)(X) = UΣV T belongs to Mr .

Alternating projections (AP)

X0 = X∗, Xn+1 = πH

[π(r)(Xn)

]for n = 0, 1, . . .

AP guarantees convergence to A, but typically does notconverge to the optimal solution

The main problem: bad starting point!

NOTE: a correction to any approximation

Theorem Let X ∈ RL×K and β ∈ R. The functionf (β) = ||βX− X∗||2F has a unique minimizer at

β =trXTX∗

trXT X.

Corollary. tr(βX− X∗

)TX = 0 , which is the so-called

‘orthogonality condition’.Proof. The function f (β) is quadratic in β, and we may write

f (β) = ||βX− X∗||2F = tr(βX− X∗)T (βX− X∗).

The derivative is given by

∂f

∂β= 2βtrXT X− 2trXTX∗.

Setting this derivative to zero and solving for β yields the result.

Family of algorithms (example)

Ingredients: Backtracking, Randomization, Corrections

I U: random number with uniform distribution in [0, 1]

I X: random Hankel matrix corresponding to Y = (ξ1, . . . , ξN)with {ξn} i.i.d. Gaussian r.v.’s, mean 0 and variance s2 ≥ 0.

Multistart APBR Run N0 independent trajectories X0,j forj = 1, . . . ,N0, X0,j = (1− s0)X∗ + s0X,{

Xn+1,j =(trZT

n,jX∗/trZT

n,jZn,j

)Zn,j with

Zn,j = (1− δn)πH[π(r)(Xn,j)

]+ δnX∗ + σnX .

where j = 1, . . . ,N0 and{δn = U/(n + 1)p, σn = 1/(n + 1)q for n = 0, 1, . . . ,NI − 1δn = 0, σn = 0 for n = NI , . . . ,NI + NII − 1

ExampleI Y

(m)∗ = (0, 3− 2m, 0,−1, 0,m, 0,−1, 0, 3− 2m, 0)T , where

m = −1, 2, 3.

I Fix L = 3 and r = 2. Set X(m)∗ = H

(Y

(m)∗

).

I rank(X

(m)∗

)= 3, m = −1, 2, 3.

I The parameters of Multistart APBR are M = 1000, c = 1,s0 = 0.25, s = 1, p = 0.5 and q = 1.5.

I The total number of iterations was fixed at 250 withNI = 200.

m AP OAP Local AP Med (APBR) Min (APBR)

−1 68.3077 68.1548 68.3077 56.8699 56.7487

2 17.0769 17.0769 17.0769 12.9900 12.8791

3 50.1888 50.1873 49.9663 36.2506 36.2357

Table: Frobenius distances to X(m)∗ .

ExampleDe Moor’s data Y = (3, 4, 1, 2, 5, 6, 7, 1, 2)T , N = 9, L = 4, K = 6and r = 3.

1. Alternating projections ||X∗ − XAP ||2F = 14.8251.2. Minimization of Lagrange f’n ||X∗ − XDeMoor ||2F = 14.1481.

APBR: N0 = 3, s0 = 1/2, s = 0.1, p = 1/2, q = 1, NI=100 andNII = 50. Set σn = 0 for all n.

||X∗ − X(1)APBR ||

2F = ||X∗ − X

(3)APBR ||

2F = 14.1478

4

6

8

10

12

14

0 20 40 60 80 100 120 140

Figure: Distances ||Xn − X∗||2F . AP iterations in grey, APBR in black.

Weighted (unstructured) low rank approximation

Problem definition

minX∈Mr

f (X) = minX∈Mr

vecT (X− X∗)Wvec(X− X∗)

Including rank constraint

minU∈RL×r ,V∈Rr×K

vecT (UV − X∗)Wvec(UV − X∗)

Alternating projections Start from initial U0 (using SVD):

Vn = arg minV∈Rr×K

vecT (Un−1V − X∗)W vec(Un−1V − X∗)

Un = arg minU∈RL×r

vecT (UVn − X∗)W vec(UVn − X∗)

Weighted (unstructured) low rank approximation

A. Similar (but not that severe!) problems with locality/globality

B. ThvecT (βX− X∗)Wvec(βX) = 0

with

β =vecT (X∗)Wvec(X)

vecT (X)Wvec(X).

Weighted SLRA: a slight change of notation

Y = (y0, y1, . . . , yN)T ;

(L+ 1)× (K + 1) Hankel matrix X:

X = XY =

y0 y1 · · · yKy1 y2 · · · yK+1...

......

...yL yL+1 · · · yN

.

Weighted SLRA: Two norms

Matrix W = (wn,n′)Nn,n′=0 ∈ M>

N defines a semi-norm in RN+1:

||Y||W =√YTWY =

N∑n,n′=0

ynwn,n′yn′

1/2

,

where Y = (y0, . . . , yN)T ∈ RN+1.

Let L be such that 1 < L < N, set K = N − L. For two matricesQ = (ql ,l ′)

Ll ,l ′=0 ∈ M>

L and R = (rk,k ′)Kk,k ′=0 ∈ M>K , we define the

(Q,R)-norm (or semi-norm) in R(L+1)×(K+1) by

||X||Q,R =√

TrQXRXT ,

where X ∈ ML×K is an arbitrary matrix of size (L+1)×(K+1)

Weighted SLRA: SVD

Numerically, computing SVD for X in the (Q,R)-norm is equivalentto computing SVD for Q1/2XR1/2 in the Frobenius norm. In thisrespect, (Q,R)-norm is equivalent to the Frobenius norm.

Weighted SLRA: Equivalence of the two norms

Theorem. Consider the (Q,R)-norm for the matrix XY associatedwith an arbitrary vector Y ∈ RN+1 and defined by Q ∈ M>

L andR ∈ M>

K . Then we have

||XY||Q,R = ||Y||W ,

where W = Q ⋆R. The matrix W is diagonal if and only if both Qand R are diagonal.

Convolution of matrices

Def. For two arbitrary matrices A = (ai ,i ′) ∈ MA×A′ andB = (bj ,j ′) ∈ MB×B′ , its convolution is the matrixC = A ⋆ B = (cm,n) ∈ M(A+A′)×(B+B′) with elements

cm,n =∑k,k′

ak,k ′bm−k,n−k ′ (m = 0, 1, . . . ,A+B; n = 0, 1, . . . ,A′+B ′) ,

where the summation in the double sum is taken over the pairs ofindices (k, k ′) such that the elements ak,k ′ and bm−k,n−k ′ aredefined; that is, 0 ≤ k ≤ min{A,m − B} and0 ≤ k ′ ≤ min{A′, n − B ′}.

Convolution of matrices: generating functions

The generating function (g.f.) of the elements of A = (ai ,i ′) is

fA(x , y) =A∑

i=0

B∑j=0

ai ,jxiy j .

fA⋆B(x , y) = fA(x , y) · fB(x , y) for all x and y .

Proof of the theorem ||XY||Q,R = ||Y||W .

Consider the norm ||X||2Q,R with X = XY (so that xl ,k = yl+k forl = 0, . . . , L and k = 0, . . . ,K ). We have

||XY||2Q,R = TraceQXYRXTY

=L∑

l=0

K∑k=0

L∑l ′=0

K∑k ′=0

ql ,l ′xl ′,k ′rk ′kxlk

=L∑

l=0

K∑k=0

L∑l ′=0

K∑k′=0

ql ,l ′yl ′+k ′rk′kyl+k

=L∑

l=0

K∑k=0

L∑l ′=0

K∑k′=0

yn′ql ,l ′rn′−l ′,n−kyn

where n = k + l , n′ = k ′ + l ′. By changing the summation indicesk → n and k ′ → n′ we obtain the required.

The case of diagonal matrices

If W = diag(W ), Q = diag(Q), R = diag(R), then

W = Q ⋆ R ⇔ W = Q ⋆ R

Here W is the set of weights in the weighted sum of squares:

||Y||2W =N∑

n=0

wny2n

where W = (w0, . . . ,wN)T , Y = (y0, . . . , yN)

T .Frobenius norm corresponds to Q = (1, . . . , 1)T ∈ RL+1,R = (1, . . . , 1)T ∈ RK+1. This gives strange weights W .

W given. Find Q and R so that W = Q ⋆ R orW ≃ Q ⋆ R .

Examples of W :

I W = (1, . . . , 1)T

I W = (1, β, . . . , βN)T

I W = (1, 1, . . . , 1, 0, 1, . . . , 1)T

I W = (1, 1, . . . , 1,∞, 1, . . . , 1)T

W = (1, 1, . . . , 1)T has the generating function

W (t) = 1 + t + t2 + . . .+ tN = (1− tN+1)/(1− t) .

We need to find polynomials Q(t) and R(t) such thatW (t) = Q(t)R(t). For W = (1, 1, . . . , 1)T , the solution is givenby cyclotomic polynomials

Find Q and R so that W = Q ⋆ R . Example.

N = 11, Q = 3, K = 8, W = (1, 1, . . . , 1)T . We have

1 + t + t2 + . . .+ t11 = (1 + t + t2 + t3)(1 + t4 + t8)

= (1 + t3)(1 + t + t2 + t6 + t7 + t8) .

This implies that W = Q ⋆ R with

{QT ,RT} =

{(1, 1, 1, 1), (1, 0, 0, 0, 1, 0, 0, 0, 1)}or

{(1, 0, 0, 1), (1, 1, 1, 0, 0, 0, 1, 1, 1)}

slra: choice of a norm and computational issues...slra: problem deﬁnition l, k and r given...

Documents