on the convergence of higher-order orthogonality iteration ... · i better scalability than...

21
On the convergence of higher-order orthogonality iteration and its extension Yangyang Xu IMA, University of Minnesota SIAM Conference LA15, Atlanta October 27, 2015

Upload: others

Post on 05-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

On the convergence ofhigher-order orthogonality iteration and its extension

Yangyang Xu

IMA, University of Minnesota

SIAM Conference LA15, Atlanta

October 27, 2015

Page 2: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Best low-multilinear-rank approximation of tensors

minC,A‖X − C ×1 A1 . . .×N AN‖2F ,

subject to An ∈ OIn×rn , An ∈ RIn×rn : A>nAn = I, ∀n.

I Any tensor has a multilinear SVD [Lathauwer-Moor-Vandewalle’00]

I Truncated multilinear SVD is not the best [L-M-V’00]

2/17

Page 3: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Face recognition by low-rank tensor approximation

Matrix Decomposition

D ≈ UC

Tensor Decomposition

D ≈ C×1U1 · · ·×5U5

I Tensor decomposition can improve recognition accuracy over 50%[Vasilescu-Terzopoulos’02]

3/17

Page 4: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Higher-order Orthogonality Iteration (HOOI)

Given A, the optimal C is

C = X ×1 A>1 . . .×N A>N .

With this C plugged in, the approximation problem becomes

minA‖X −X ×1 A1A

>1 . . .×N ANA>N‖2F ,

subject to An ∈ OIn×rn , ∀n,

which is equivalent to

maxA‖X ×1 A

>1 . . .×N A>N‖2F = ‖A>n unfoldn(X ×i 6=n A>i )‖2F ,

subject to An ∈ OIn×rn , ∀n.

4/17

Page 5: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Higher-order Orthogonality Iteration (HOOI)

Algorithm 1: Higher-order orthogonality iteration (HOOI)

Input: X and (r1, . . . , rN )Initialization: choose (A0

1, . . . ,A0N ) with A0

n ∈ OIn×rn , ∀n and set k = 0while not convergent do

for n = 1, . . . , N doSet Ak+1

n to an orthonormal basis of the dominant rn-dimensional leftsingular subspace of

Gkn = unfoldn(X ×i<n (Ak+1

i )> ×i>n (Aki )>)

increase k ← k + 1

I Widely used and fit to large-scale problems

I Better scalability than Newton-type methods such as Newton-Grassmann[Elden-Savas’09] and Riemannian trust region [Ishteva et. al’11]

I Convergence is still an open question

4/17

Page 6: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

this talk addresses the open questionsubspace convergence of HOOI

I Why care about convergenceI Without convergence, HOOI can give severely different multilinear

subspace by running to different numbers of iterations

I ChallengesI Non-convexityI Non-uniqueness of dominant subspaceI Non-uniqueness of orthonormal basis

I How to establish convergenceI Greedy selection of orthonormal basisI Kurdyka- Lojasiewicz property

5/17

Page 7: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Greedy higher-order orthogonality iteration

Algorithm 2: Greedy HOOI

Input: X and (r1, . . . , rN )Initialization: choose (A0

1, . . . ,A0N ) with A0

n ∈ OIn×rn , ∀n and set k = 0while not convergent do

for n = 1, . . . , N doSet Ak+1

n ∈ argminAn‖An −Ak

n‖F over all orthonormal bases of thedominant rn-dimensional left singular subspace of

Gkn = unfoldn(X ×i<n (Ak+1

i )> ×i>n (Aki )>)

increase k ← k + 1

I Iterate mapping is continuous

I Will be used to establish convergence of original HOOI

6/17

Page 8: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Block-nondegeneracy

A is a block-nondegenerate point if σrn(Gn) > σrn+1(Gn), ∀n where

Gn = unfoldn(X ×i 6=n Ai)

I Dominant multilinear subspace is unique

I Equivalent to negative definiteness of block Hessian on OIn×rn for alocal maximizer

I Necessary condition for convergenceI A degenerate first-order optimal point is not stationary

7/17

Page 9: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Convergence analysis of greedy HOOI

Let Akk≥1 be the sequence from greedy HOOI. Assume there is ablock-nondegenerate limit point A.

I A is a block-wise maximizer and a critical point

I There is α > 0 such that if Ak sufficiently close to A, then

α‖Ak+1 −Ak‖2F ≤ F (Ak+1)− F (Ak).

where F (A) = ‖X ×1 A>1 . . .×N A>N‖2F −

∑Nn=1 ιOIn×rn

I If Ak is sufficiently close to A, then

dist(0, ∂F (Ak)) ≤ C‖Ak −Ak−1‖F .

I Further with Kurdyka- Lojasiewicz property ⇒ Ak → A as k →∞I Local convergence to globally optimal solution

8/17

Page 10: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Convergence analysis of greedy HOOI

Kurdyka- Lojasiewicz Property: the following quantity is bounded near A fora certain θ ∈ [0, 1)

|F (A)− F (A)|θ

dist(0, ∂F (A))

Use (1− θ)(a− b) ≤ aθ(a1−θ − b1−θ), a, b ≥ 0 and previous inequalities to have

‖Ak+1 −Ak‖2F.F (Ak+1)− F (Ak)

.(F (A)− F (Ak))θ((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ)

.dist(0, ∂F (A))((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ)

.‖Ak −Ak−1‖F((F (A)− F (Ak))1−θ − (F (A)− F (Ak+1))1−θ).

Together with Young’s inequality gives∑k ‖Ak+1 −Ak‖F <∞

8/17

Page 11: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Convergence analysis of original HOOI

Starting from Ak0 sufficiently close to a block-nondegenerate limit pointA of Akk≥1, we have

Akn(Ak

n)> = Akn(Ak

n)>, ∀n, ∀k ≥ k0

where Akk≥k0 and Akk≥k0 are sequences from the original andgreedy HOOIs respectively.

I A is a block-wise maximizer and a critical point

I Akn(Ak

n)> → AnA>n as k →∞

I Local convergence to globally optimal multilinear subspace

9/17

Page 12: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Best low-rank tensor approximation with missing values

minX ,C,A

‖X − C ×1 A1 . . .×N AN‖2F ,

subject to An ∈ OIn×rn , ∀n; PΩ(X ) = PΩ(M)

I Reduces to the best low-rank tensor approximation if Ω contains allelements

I Can estimate the original tensor M simultaneously

10/17

Page 13: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Best low-rank tensor approximation with missing values

Low-rank tensor approximation with missing values[Geng-Miles-Zhou-Wang’11]

10/17

Page 14: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Incomplete higher-order orthogonality iteration

Algorithm 3: incomplete HOOI (iHOOI)

Input: PΩ(M) and (r1, . . . , rN )

Initialization: choose X 0 and (A01, . . . ,A

0N ) and set k = 0

while not convergent dofor n = 1, . . . , N do

Set Ak+1n to an orthonormal basis of the dominant rn-dimensional left

singular subspace of

Gkn = unfoldn(X ×i<n (Ak+1

i )> ×i>n (Aki )>)

X k+1 ← PΩ(M) + PΩc

(X k ×Nn=1 A

k+1n (Ak+1

n )>)

increase k ← k + 1

I X -update is one step of the alternating minimization for

minC,X‖X − C ×Ni=1 A

k+1i (Ak+1

i )>‖2F , subject to PΩ(X ) = PΩ(M)

11/17

Page 15: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Experimental results

Low-multilinear-rank tensor approximation

I Relation between iterate sequences by original and greedy HOOIs

Low-rank tensor completion

I Phase transition plots (simulate how many samples guarantee exactreconstruction)

12/17

Page 16: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Comparing HOOIs on subspace learning

0 100 200 300 400 50010

−8

10−6

10−4

10−2

100

Number of iterations

Sub

spac

e re

lativ

e ch

ange

s

Greedy HOOIOriginal HOOI

Gaussian random tensorfull size: 50× 50× 50

core size: 5× 5× 5

Akn(Ak

n)> = Akn(Ak

n)>, ∀n, k

0 100 200 300 400 50010

15

n =

1

rn−th singular value

(rn+1)−th singular value

0 100 200 300 400 50010

15

n =

2

0 100 200 300 400 50010

15

n =

3

Number of iterations

13/17

Page 17: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Comparing HOOIs on subspace learning

0 100 200 300 400 50010

−15

10−10

10−5

100

Number of iterations

Sub

spac

e re

lativ

e ch

ange

s

Greedy HOOIOriginal HOOI

Yale Face Databasefull size: 38× 64× 2958

core size: 5× 5× 20

Akn(Ak

n)> = Akn(Ak

n)>, ∀n, k

0 100 200 300 400 500

50

60

70

n =

1

rn−th singular value

(rn+1)−th singular value

0 100 200 300 400 500

50

60

70

n =

2

0 100 200 300 400 500

14.5

15

n =

3

Number of iterations

13/17

Page 18: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Phase transition plots

iHOOI (prop’d) WTucker geomCG

rank r

sam

ple

ratio

5 10 15 20 25 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

rank rsa

mpl

e ra

tio

5 10 15 20 25 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

rank r

sam

ple

ratio

5 10 15 20 25 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

I 50× 50× 50 randomly generated tensor with rank (r, r, r)

I True rank provided (adaptive rank estimate can be applied)

I WTucker [Filipovic-Jukic’13]: alternating minimization tominA,C ‖PΩ(C ×Ni=1 Ai)− PΩ(M)‖2F

I geomCG [Kressner-Steinlechner-Vandereycken’13]: Riemannianoptimization method

Proposed method achieves near-optimal reconstruction

14/17

Page 19: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Convergence to globally optimal solution

Observation:‖PΩ(C×N

n=1An)−PΩ(M)‖F‖PΩ(M)‖F always small

I Low sampling ratio ⇒ possibly more than one feasible low-rank solutions

I A0 sufficiently close to bases of M ⇒ Ck ×Nn=1 Akn →M

0 1 2 3 4 50

10

20

30

40

50

noise level δ

succ

ess

times

am

ong

50

tensor size: (50,50,50); rank: (20,20,20); sample ratio: 10%A0 ← truncated HOSVD of M + δ‖M‖F N

‖N‖F

15/17

Page 20: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

Conclusions

I Give subspace sequence convergence of the HOOI methodI first result on convergence of HOOII via the convergence of a greedy HOOI and Kurdyka- Lojasiewicz

inequalityI block-nondegeneracy is sufficient and necessary

I Propose an incomplete HOOI method for applications with missingvalues

I Subspace learning and tensor completion simultaneouslyI Empirically, near-optimal recovery on low-rank tensors

16/17

Page 21: On the convergence of higher-order orthogonality iteration ... · I Better scalability than Newton-type methods such as Newton-Grassmann ... convergence I Without convergence, HOOI

References

I Y. Xu. On the convergence of higher-order orthogonality iteration. arXiv,2015.

I Y. Xu. On higher-order singular value decomposition from incompletedata. arXiv, 2014.

17/17