slra: choice of a norm and computational issues...slra: problem definition l, k and r given...
Post on 20-Jul-2020
0 Views
Preview:
TRANSCRIPT
SLRA: choice of a norm and computational issues
Anatoly Zhigljavsky
Cardiff University
Collaborators: Nina Golyandina (St.Petersburg), Jonathan Gillard (Cardiff)
Grenoble, June 2, 2015
SLRA: Problem definition
L, K and r given positive integers such that 1 ≤ r < L ≤ K .
I Mr = ML×Kr ⊂ RL×K , set of matrices with rank ≤ r
I H = HL×K ⊂ RL×K , set of matrices of Hankel structure
I A = Mr ∩HAssume we are given a matrix X⋆ ∈ H.Hankel structured low rank approximation (SLRA) problem is:
f (X,X⋆) → minX∈A
Common choice of f : f (X,X⋆) = ||X− X⋆||2F
Main application area: time series and signal processing
Map Y = (y1, y2, . . . , yN)T into an L× K Hankel matrix X:
X = XY =
y1 y2 · · · yKy2 y3 · · · yK+1...
......
...yL yL+1 · · · yN
.
Common parameterization of elements in A
X ∈ A
⇕
Y(θ) = (y1(θ), . . . , yN(θ))T ,N = L+ K − 1
⇕
yn(θ) =
q∑l=1
al exp(dln) sin(2πωln + ϕl).
Known as ‘sums of damped sinusoids’.
Question:How difficult is the parametric optimization problem?
Answer:Very difficult.
Damped sinusoids: Example
The objective function f (ω) =∑N
n=1 (yn − sin(2πωn))2, ω = 0.35;N = 10 and N = 100.
0
2
4
6
8
10
12
14
16
18
0.2 0.4 0.6 0.8 1ω
(a)
0
50
100
150
200
0.2 0.4 0.6 0.8 1ω
(b)
0
20
40
60
80
100
120
0.2 0.25 0.3 0.35 0.4 0.45 0.5ω
(c)
Figure: Function f (ω).
Damped sinusoids: ExampleThe objective function f (ω) =
∑Nn=1 (yn − sin(2πωn))2, ω = 0.35.
For N = 10, the (global) Lipschitz constant of f is approximately327.86. For N = 100, the Lipschitz constant of f is approximately6195.88.
–1
0
1
2
3
4
0.2 0.3 0.4 0.5 0.6 0.7 0.8ω
(a) log |f ′(ω)|
3
4
5
6
7
0.3 0.32 0.34 0.36 0.38 0.4ω
(b) log |f ′′(ω)| in the region ofthe global minimizer ω(0) = 0.3
Figure: First and second derivatives of f (ω).
Damped sinusoids: ExampleN = 10, yn + εn, where {εn, n = 1, . . . ,N} are normally distributednoise terms with mean 0 and variance σ2.
0
10
20
30
40
50
0.2 0.4 0.6 0.8 1ω
(a) ω∗ = 0.3445f (ω∗) = 5.17 (5.43)
0
10
20
30
40
50
0.2 0.4 0.6 0.8 1ω
(b) ω∗ = 0.4710f (ω∗) = 10.00 (10.86)
0
10
20
30
40
50
0.2 0.4 0.6 0.8 1ω
(c) ω∗ = 0.4694f (ω∗) = 13.56 (16.29)
Figure: Function f (ω), σ2 = 0.5, 1.0, 1.5; f(0.35) is in round brackets.
Damped sinusoids: Example (2)The objective function:
f (ω1, ω2) =N∑j=1
(yj − sin(2πω1j)− sin(2πω2j))2 , ω1 = 0.3, ω2 = 0.32 .
0
5
10
15
20
0.2 0.4 0.6 0.8 1ω
(a) Plot off (ω1, 0.32)
(b) Plot of f (ω1, ω2)
0.2
0.25
0.3
0.35
0.4
0.45
0.5
ω_2
0.2 0.25 0.3 0.35 0.4 0.45 0.5
ω_1
(c) Contourplot off (ω1, ω2) with (0.3, 0.32)marked (+)
Figure: Function f (ω1, ω2).
Multi-extremality and existing methods
Local/global:
I Number of local minima: linear function of N
I Effect of noise: move and dampen true ‘global’ minimum
Existing methods of solving SLRA:
I Based on the use of AP ( alternating projections)
I Based on a local approximation
I Can only reduce the rank of the matrix by one
Question:How good is the method of AP?
Answer:Not good.
Projections
Projection of X to H, called πH(X):The closest Hankel matrix (in Frobenius norm) to any given matrixis obtained by using the simple diagonal averaging procedure.Projection of X to Mr , called π(r)(X)Denote σi = σi (X), the singular values of X, be ordered such thatσ1 ≥ σ2 ≥ . . . ≥ σL.Let Σ0 = diag(σ1, σ2, . . . , σL) ,Σ = diag(σ1, σ2, . . . , σr , 0, . . . , 0).The SVD of X can be written as X = UΣ0V
T and the matrixπ(r)(X) = UΣV T belongs to Mr .
Alternating projections (AP)
X0 = X∗, Xn+1 = πH
[π(r)(Xn)
]for n = 0, 1, . . .
AP guarantees convergence to A, but typically does notconverge to the optimal solution
The main problem: bad starting point!
NOTE: a correction to any approximation
Theorem Let X ∈ RL×K and β ∈ R. The functionf (β) = ||βX− X∗||2F has a unique minimizer at
β =trXTX∗
trXT X.
Corollary. tr(βX− X∗
)TX = 0 , which is the so-called
‘orthogonality condition’.Proof. The function f (β) is quadratic in β, and we may write
f (β) = ||βX− X∗||2F = tr(βX− X∗)T (βX− X∗).
The derivative is given by
∂f
∂β= 2βtrXT X− 2trXTX∗.
Setting this derivative to zero and solving for β yields the result.
Family of algorithms (example)
Ingredients: Backtracking, Randomization, Corrections
I U: random number with uniform distribution in [0, 1]
I X: random Hankel matrix corresponding to Y = (ξ1, . . . , ξN)with {ξn} i.i.d. Gaussian r.v.’s, mean 0 and variance s2 ≥ 0.
Multistart APBR Run N0 independent trajectories X0,j forj = 1, . . . ,N0, X0,j = (1− s0)X∗ + s0X,{
Xn+1,j =(trZT
n,jX∗/trZT
n,jZn,j
)Zn,j with
Zn,j = (1− δn)πH[π(r)(Xn,j)
]+ δnX∗ + σnX .
where j = 1, . . . ,N0 and{δn = U/(n + 1)p, σn = 1/(n + 1)q for n = 0, 1, . . . ,NI − 1δn = 0, σn = 0 for n = NI , . . . ,NI + NII − 1
ExampleI Y
(m)∗ = (0, 3− 2m, 0,−1, 0,m, 0,−1, 0, 3− 2m, 0)T , where
m = −1, 2, 3.
I Fix L = 3 and r = 2. Set X(m)∗ = H
(Y
(m)∗
).
I rank(X
(m)∗
)= 3, m = −1, 2, 3.
I The parameters of Multistart APBR are M = 1000, c = 1,s0 = 0.25, s = 1, p = 0.5 and q = 1.5.
I The total number of iterations was fixed at 250 withNI = 200.
m AP OAP Local AP Med (APBR) Min (APBR)
−1 68.3077 68.1548 68.3077 56.8699 56.7487
2 17.0769 17.0769 17.0769 12.9900 12.8791
3 50.1888 50.1873 49.9663 36.2506 36.2357
Table: Frobenius distances to X(m)∗ .
ExampleDe Moor’s data Y = (3, 4, 1, 2, 5, 6, 7, 1, 2)T , N = 9, L = 4, K = 6and r = 3.
1. Alternating projections ||X∗ − XAP ||2F = 14.8251.2. Minimization of Lagrange f’n ||X∗ − XDeMoor ||2F = 14.1481.
APBR: N0 = 3, s0 = 1/2, s = 0.1, p = 1/2, q = 1, NI=100 andNII = 50. Set σn = 0 for all n.
||X∗ − X(1)APBR ||
2F = ||X∗ − X
(3)APBR ||
2F = 14.1478
4
6
8
10
12
14
0 20 40 60 80 100 120 140
Figure: Distances ||Xn − X∗||2F . AP iterations in grey, APBR in black.
Weighted (unstructured) low rank approximation
Problem definition
minX∈Mr
f (X) = minX∈Mr
vecT (X− X∗)Wvec(X− X∗)
Including rank constraint
minU∈RL×r ,V∈Rr×K
vecT (UV − X∗)Wvec(UV − X∗)
Alternating projections Start from initial U0 (using SVD):
Vn = arg minV∈Rr×K
vecT (Un−1V − X∗)W vec(Un−1V − X∗)
Un = arg minU∈RL×r
vecT (UVn − X∗)W vec(UVn − X∗)
Weighted (unstructured) low rank approximation
A. Similar (but not that severe!) problems with locality/globality
B. ThvecT (βX− X∗)Wvec(βX) = 0
with
β =vecT (X∗)Wvec(X)
vecT (X)Wvec(X).
Weighted SLRA: a slight change of notation
Y = (y0, y1, . . . , yN)T ;
(L+ 1)× (K + 1) Hankel matrix X:
X = XY =
y0 y1 · · · yKy1 y2 · · · yK+1...
......
...yL yL+1 · · · yN
.
Weighted SLRA: Two norms
Matrix W = (wn,n′)Nn,n′=0 ∈ M>
N defines a semi-norm in RN+1:
||Y||W =√YTWY =
N∑n,n′=0
ynwn,n′yn′
1/2
,
where Y = (y0, . . . , yN)T ∈ RN+1.
Let L be such that 1 < L < N, set K = N − L. For two matricesQ = (ql ,l ′)
Ll ,l ′=0 ∈ M>
L and R = (rk,k ′)Kk,k ′=0 ∈ M>K , we define the
(Q,R)-norm (or semi-norm) in R(L+1)×(K+1) by
||X||Q,R =√
TrQXRXT ,
where X ∈ ML×K is an arbitrary matrix of size (L+1)×(K+1)
Weighted SLRA: SVD
Numerically, computing SVD for X in the (Q,R)-norm is equivalentto computing SVD for Q1/2XR1/2 in the Frobenius norm. In thisrespect, (Q,R)-norm is equivalent to the Frobenius norm.
Weighted SLRA: Equivalence of the two norms
Theorem. Consider the (Q,R)-norm for the matrix XY associatedwith an arbitrary vector Y ∈ RN+1 and defined by Q ∈ M>
L andR ∈ M>
K . Then we have
||XY||Q,R = ||Y||W ,
where W = Q ⋆R. The matrix W is diagonal if and only if both Qand R are diagonal.
Convolution of matrices
Def. For two arbitrary matrices A = (ai ,i ′) ∈ MA×A′ andB = (bj ,j ′) ∈ MB×B′ , its convolution is the matrixC = A ⋆ B = (cm,n) ∈ M(A+A′)×(B+B′) with elements
cm,n =∑k,k′
ak,k ′bm−k,n−k ′ (m = 0, 1, . . . ,A+B; n = 0, 1, . . . ,A′+B ′) ,
where the summation in the double sum is taken over the pairs ofindices (k, k ′) such that the elements ak,k ′ and bm−k,n−k ′ aredefined; that is, 0 ≤ k ≤ min{A,m − B} and0 ≤ k ′ ≤ min{A′, n − B ′}.
Convolution of matrices: generating functions
The generating function (g.f.) of the elements of A = (ai ,i ′) is
fA(x , y) =A∑
i=0
B∑j=0
ai ,jxiy j .
fA⋆B(x , y) = fA(x , y) · fB(x , y) for all x and y .
Proof of the theorem ||XY||Q,R = ||Y||W .
Consider the norm ||X||2Q,R with X = XY (so that xl ,k = yl+k forl = 0, . . . , L and k = 0, . . . ,K ). We have
||XY||2Q,R = TraceQXYRXTY
=L∑
l=0
K∑k=0
L∑l ′=0
K∑k ′=0
ql ,l ′xl ′,k ′rk ′kxlk
=L∑
l=0
K∑k=0
L∑l ′=0
K∑k′=0
ql ,l ′yl ′+k ′rk′kyl+k
=L∑
l=0
K∑k=0
L∑l ′=0
K∑k′=0
yn′ql ,l ′rn′−l ′,n−kyn
where n = k + l , n′ = k ′ + l ′. By changing the summation indicesk → n and k ′ → n′ we obtain the required.
The case of diagonal matrices
If W = diag(W ), Q = diag(Q), R = diag(R), then
W = Q ⋆ R ⇔ W = Q ⋆ R
Here W is the set of weights in the weighted sum of squares:
||Y||2W =N∑
n=0
wny2n
where W = (w0, . . . ,wN)T , Y = (y0, . . . , yN)
T .Frobenius norm corresponds to Q = (1, . . . , 1)T ∈ RL+1,R = (1, . . . , 1)T ∈ RK+1. This gives strange weights W .
W given. Find Q and R so that W = Q ⋆ R orW ≃ Q ⋆ R .
Examples of W :
I W = (1, . . . , 1)T
I W = (1, β, . . . , βN)T
I W = (1, 1, . . . , 1, 0, 1, . . . , 1)T
I W = (1, 1, . . . , 1,∞, 1, . . . , 1)T
W = (1, 1, . . . , 1)T has the generating function
W (t) = 1 + t + t2 + . . .+ tN = (1− tN+1)/(1− t) .
We need to find polynomials Q(t) and R(t) such thatW (t) = Q(t)R(t). For W = (1, 1, . . . , 1)T , the solution is givenby cyclotomic polynomials
Find Q and R so that W = Q ⋆ R . Example.
N = 11, Q = 3, K = 8, W = (1, 1, . . . , 1)T . We have
1 + t + t2 + . . .+ t11 = (1 + t + t2 + t3)(1 + t4 + t8)
= (1 + t3)(1 + t + t2 + t6 + t7 + t8) .
This implies that W = Q ⋆ R with
{QT ,RT} =
{(1, 1, 1, 1), (1, 0, 0, 0, 1, 0, 0, 0, 1)}or
{(1, 0, 0, 1), (1, 1, 1, 0, 0, 0, 1, 1, 1)}
top related