![Page 1: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/1.jpg)
Donald Goldfarb
IEOR Department
Columbia University
1
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAA
Optimization for Tensor Models
UCLA Mathematics Department
Distinguished Lecture Series
May 17 – 19, 2016
![Page 2: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/2.jpg)
Tensors
2
Matrix
Tensor: higher-order matrix
three-way tensor:
p-way tensor:
![Page 3: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/3.jpg)
Tensors
3
Tensor == Vector ? Fibers?
(Kolda & Bader, 2009)
can be viewed as a collection of fibers
![Page 4: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/4.jpg)
Tensors
4
Tensor == Matrix ? Slices?
(Kolda & Bader, 2009)
can be viewed as a collection of slices
![Page 5: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/5.jpg)
Tensors
5
Why tensors?
tensors capture multilinear structure
Tensor: object in its own right
its own geometrical, statistical and computational issues
much harder to work with than a matrix
more flexible and powerful models
e.g. parameter estimation in latent variable modelling (to be discussed shortly)
![Page 6: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/6.jpg)
Dataset: 𝑚-word documents
Goal: learn parameters
𝑘 topics (dists. over 𝑑 words)
sample topic ℎ:
𝑥1, 𝑥2, … , 𝑥𝑚 sampled i.i.d. from 𝝁𝑖
Model:
each document has 𝒎 words
Example: Single Topic Model
6
voc.
prob.
𝒐
prob ℎ = 𝑖 = 𝑤𝑖 (𝑖 ∈ [𝑘])
![Page 7: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/7.jpg)
Key idea: find parameters (approx.) consistent with observed moments (Pearson, 1894)
Karl Pearson (1857~1936)
Method of moments:
Example: Single Topic Model
7
Procedure: setup equations
𝑴𝒐𝒎𝒆𝒏𝒕𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝜽 = 𝑴𝒐𝒎𝒆𝒏𝒕 𝒔𝒂𝒎𝒑𝒍𝒆
solve approximately
![Page 8: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/8.jpg)
Example: Single Topic Model
8
Goal: model the topics of the documents in a given corpus
𝜽
Samples of documents generated based on single topic model with params 𝜽 = ( 𝝁𝒊 , 𝑤𝑖 )
Which moments to use? 1st -order? 2nd -order? 3rd -order? pth -order?
Model parameters Unsupervised learning alg. Method of moments setup equations solve them approx.
![Page 9: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/9.jpg)
Binary encoding:
𝒙𝑡 = 𝒆𝑖 the 𝑡-th word in the document is 𝑖-th word in the vocabulary
𝑡, 𝑖 ∈ 𝑚 × [𝑑]
9
Example: Single Topic Model
vocab. 𝒙𝟏 𝒙𝟐 𝒙𝟑
cats 0 0 1
dogs 0 0 0
fear 0 0 0
I 1 0 0
like 0 1 0
raise 0 0 0
want 0 0 0
E.g.
topic: animal
3-word document
7-word vocabulary
![Page 10: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/10.jpg)
First moment
Not identifiable: only 𝑑 numbers for 𝑑 + 1 𝑘 parameters.
Binary encoding:
10
Example: Single Topic Model
𝒙𝑡 = 𝒆𝑖 the 𝑡-th word in the document is 𝑖-th word in the vocabulary
𝑡, 𝑖 ∈ 𝑚 × [𝑑]
![Page 11: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/11.jpg)
Second moment
Matrix-mode: still not identifiable even though 𝑑 𝑑+1
2> 𝑑 + 1 𝑘. (Why?)
Example: Single Topic Model
11
![Page 12: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/12.jpg)
Identifiable? NO!
Matrix mode: INSUFFICIENT to identify parameters.!
Example: Single Topic Model
12
𝑼 ∈ ℝ𝑛×𝑘 is a solution
𝑴 = 𝑼𝑼𝑇 ∈ ℝ𝑛×𝑛
𝑴 = 𝑼𝑸(𝑼𝑸)𝑇, for any 𝑸 ∈ 𝒪𝑘 (𝑸𝑸𝑇 = 𝑸𝑇𝑸 = 𝑰𝑘)
𝑼 ∈ ℝ𝑛×𝑘 is a solution for any 𝑸 ∈ 𝒪 𝑘
AMBUIGUITY!
![Page 13: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/13.jpg)
Third-order moment
Tensor mode
Example: Single Topic Model
13
![Page 14: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/14.jpg)
Claim: uniquely determine the parameters 𝜽
Example: Single Topic Model
14
![Page 15: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/15.jpg)
Reduction to orthogonal case via whitening
Whiten:
project to 𝑘 dimensions
transform to orthogonality
apply 𝑾 to 𝑴 and 𝑻
𝒗i i∈[𝑘] forms an
orthonormal basis for ℝ𝑘
Example: Single Topic Model
15
𝑴 = 𝑼𝑫𝑼𝑇
𝑾 = 𝑼𝑫−𝟏/𝟐
Reduced Eigen-Decomp.
![Page 16: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/16.jpg)
Spectral theorem and eigen-decompositions
symmetric matrix symmetric tensor
𝒗i i∈[𝑘] forms an orthnormal basis for ℝ𝑘
eigen-decomp. is unique iff 𝜆𝑖 ≠ 𝜆𝑗
if such decomp. exists, then it is always unique (even if 𝜆𝑖‘s all same)
Identifiability issue is resolved via tensor mode!
Uniqueness of orthogonal decomp. uniquely determine
the parameters 𝜽
Example: Single Topic Model
16
![Page 17: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/17.jpg)
2nd Example: Mixture of Spherical Gaussians
17
Dataset: multiple points
Goal: learn parameters
𝑘 means
Sample cluster ℎ = 𝑖 with probability 𝑤𝑖 (𝑖 ∈ [𝑘])
Observe 𝒙, with i.i.d. homogeneous spherical noise
Model:
![Page 18: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/18.jpg)
2nd Example: Mixture of Spherical Gaussians
18
Identifiable using 1st, 2nd and 3rd –order moments together [Hsu & Kakade, ‘13]
Same approach as simple topic model!
![Page 19: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/19.jpg)
General Principle
19
Similar structures prevail in latent variable models
Latent Dirichlet Allocatioin (LDA) Mixture of Gaussians (MoG) Hidden Markov Models (HMMs) Mixed Multinomial Logit Model
![Page 20: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/20.jpg)
Orthogonal Decomposition
How to find (𝜆𝑖 , 𝒗i) i∈[𝑘] ?
symmetric matrix symmetric tensor
successive rank-one approximation (SROA)
generalized SROA?
20
( < 𝒗i, 𝒗j> = 𝜹𝑖𝑗 )
![Page 21: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/21.jpg)
Successive Rank-One Approximation
Symmetric matrix
SROA: rank-one approximation + deflation
Repeat 𝑘 times
Recover exactly ( )
(rank-one approx.)
(deflation)
21
![Page 22: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/22.jpg)
Successive Rank-One Approximation
Symmetric orthogonal decomposable (SOD) tensor
Recover exactly up to sign flips [Zhang & Golub, 01]
N.B.: no requirement on 𝜆’s to be distinct 22
SROA: rank-one approximation + deflation
Repeat 𝑘 times
(rank-one approx.)
(deflation)
![Page 23: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/23.jpg)
Successive Rank-One Approximation
# samples = ∞
# samples < ∞
Other potential sources of perturbation
Model misspecification Numerical error ……
Is SROA robust to the perturbation?
Recall matrix perturbation theory (e.g. Davis-Kahan), which requires 𝑝𝑒𝑟𝑡𝑢𝑟𝑏𝑎𝑡𝑖𝑜𝑛 𝑚𝑎𝑡𝑟𝑖𝑥 < min
𝑖≠𝑗 |𝜆𝑖 − 𝜆𝑗|.
23
Sampling error
![Page 24: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/24.jpg)
Successive Rank-One Approximation
Perturbed SOD tensor
24
SROA: rank-one approximation + deflation
Repeat 𝑘 times
(rank-one approx.)
(deflation)
![Page 25: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/25.jpg)
Successive Rank-One Approximation
Output
perm. on s.t.
25
Theorem (MHG, ‘15)
Repeat 𝑘 times
(rank-one approx.)
(deflation)
Input: the perturbed SOD tensor
Output:
N.B. generalizes matrix perturbation analysis NO spectral gap quantity involved
![Page 26: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/26.jpg)
Best Rank-One Tensor Approximation
26
eliminate , then solve
&
![Page 27: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/27.jpg)
SDP Relaxation [Jiang, Ma & Zhang, ‘14]
27
reshape to
induced symmetry &
convex relaxation
(*)
The 𝑝 = 3 tensor problem can be transformed into a 𝑝 = 4 tensor problem [JMZ, ‘14]. (*)
![Page 28: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/28.jpg)
SDP Relaxation
28
Linear constraints
𝑿 =
𝑎 𝑑 𝑑 𝑏𝑑 𝑏 𝑏 𝑒𝑑 𝑏 𝑏 𝑒𝑏 𝑒 𝑒 𝑐
E.g. 𝒗 = 𝑣1, 𝑣2 𝑇
𝑿 = 𝑣𝑒𝑐(𝒗⊗ 𝒗) 𝑣𝑒𝑐 𝒗⊗ 𝒗 𝑇 =
𝑣12
𝑣1𝑣2𝑣2𝑣1𝑣22
𝑣12 𝑣1𝑣2 𝑣2𝑣1 𝑣2
2
=
𝑣14 𝑣13𝑣2 𝑣2𝑣1
3 𝑣12𝑣22
𝑣13𝑣2 𝑣1
2𝑣22 𝑣12𝑣22 𝑣1𝑣2
3
𝑣2𝑣13 𝑣12𝑣22 𝑣12𝑣22 𝑣1𝑣2
3
𝑣12𝑣22 𝑣1𝑣2
3 𝑣1𝑣23 𝑣24
hyper- symmetry
spherical constr. 1 = 𝑣 22 = 𝑣12 + 𝑣22
= 𝑣12 + 𝑣22 2 = 𝑡𝑟𝑎𝑐𝑒(𝑿)
![Page 29: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/29.jpg)
SDP Relaxation
29
N.B. SDP relaxation proposed by [Jiang, Ma & Zhang, ‘14] Square reshaping trick for low-rank tensor recovery [MHWG, ‘14]
Equivalent SDP (of reduced size) proposed by [Nie & Wang, ‘14]
using moment-based convex relaxation
Empirically, observed almost always! i.e. , SDP relaxation solves nonconvex problem!
![Page 30: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/30.jpg)
Solving SDP
Semidefinite programming (SDP)
Linear constraints:
30
![Page 31: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/31.jpg)
Deriving the dual problem
Lagrangian function
Dual problem
31
![Page 32: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/32.jpg)
Augmented Lagrangian Method
Augmented Lagrangian function
Augmented Lagrangian method (ALM)
𝒌-th iteration:
compute
update
minimizing jointly over is hard! 32
![Page 33: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/33.jpg)
Alternating Direction Method of Multipliers (ADMM)
ADMM
𝒌-th iteration:
𝒚-update:
Remedy: alternating direction
minimize along 𝒚-direction and 𝑺-direction alternatively
𝑺-update:
𝑿-update:
Each step is (relatively) easy to compute! 33
![Page 34: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/34.jpg)
Update 𝒚
First-order optimality condition:
assume is onto (surjective)
34
![Page 35: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/35.jpg)
Update 𝑺
( Eigen-Decomp.)
with
35
![Page 36: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/36.jpg)
Update muliplier 𝑿
36
![Page 37: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/37.jpg)
Convergence
Assumption:
Convergence result
is onto ∃ s.t. and (Slater’s condition)
a primal and dual solution
From any starting point,
Semidefinite programming (SDP)
37
Theorem (WGY, ‘10)
![Page 38: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/38.jpg)
Another ADMM for Tensor Problem
38
Tensor robust principal component anaysis (T-RPCA)
𝓧 = 𝓛 + 𝓢 ∈ ℝ𝑛1×𝑛2×⋯×𝑛𝐾
Structural assumptions:
𝓛: low rank in each mode ( )
𝓢: sparsely supported
i.e. rank(𝓛(𝒌)) is small for all 𝑘 ∈ [𝐾]
i.e. cardinality 𝓢: 𝓢 ≠ 0 is small
Problem: Given 𝓧, recover 𝓛 and 𝓢.
𝓛(𝒌)
![Page 39: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/39.jpg)
T-RPCA
39
Convex surrogates
Convex optimization
rank(𝓛(𝒊)) 𝓛(𝒊) ∗≔ ∑𝝈𝒋(𝓛(𝒊))
cardinality(𝓢) 𝓢 1 ≔ ∑ |𝓢𝒊𝟏𝒊𝟐⋯𝒊𝑲|
min𝓛,𝓢 𝜆𝑖 𝓛 𝑖 ∗ + 𝓢 1
s.t. 𝓛 + 𝓢 = 𝓧
N.B. generalizes matrix robust PCA [CLMW, 11]
theoretical guarantees [MHWG, ‘14] [HMGW, ‘15]
![Page 40: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/40.jpg)
T-RPCA
40
Variable Splitting
λ𝑖 𝓛𝑖, 𝑖 ∗𝑖
λ𝒊 𝓛(𝒊) ∗𝑖
𝓛
𝓛1 = 𝓛2 = ⋯ = 𝓛𝐾
![Page 41: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/41.jpg)
T-RPCA
min𝑳𝑖 , 𝑺 𝜆𝑖 𝓛𝑖, 𝑖 ∗ + 𝓢 1
s.t. 𝓛𝑖 + 𝓢 = 𝓧, ∀ 𝑖 ∈ [𝐾]
Reformulated problem
Augmented Lagrangian function
ADMM 𝒌-th iteration:
𝓛𝑖 -update:
𝓢-update:
𝚲𝑖-update: 41
![Page 42: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/42.jpg)
References
42
[ZG] Zhang, T. and Golub G. H.. "Rank-one approximation to high order tensors." SIAM Journal on Matrix Analysis and Applications 23.2 (2001): 534-550. [AGHKT] Anandkumar, A., et al. "Tensor decompositions for learning latent variable models." The Journal of Machine Learning Research 15.1 (2014): 2773-2832. [HS] Hsu, D. and Sham M. K.. "Learning mixtures of spherical gaussians: moment methods and spectral decompositions." Proceedings of the 4th conference on Innovations in Theoretical Computer Science. ACM, 2013. [TB] Kolda, T. G. and Bader B. W. Bader.. "Tensor decompositions and applications." SIAM review 51.3 (2009): 455-500. [JMZ] Jiang, B., Ma, S. and Zhang S.. "Tensor principal component analysis via convex optimization." Mathematical Programming150.2 (2015): 423- 457. [NW] Nie, J. and Wang, L.. "Semidefinite relaxations for best rank-1 tensor approximations." SIAM Journal on Matrix Analysis and Applications 35.3 (2014): 1155-1179. [CLMW] Candès, E. J., et al. "Robust principal component analysis?."Journal of the ACM (JACM) 58.3 (2011): 11.
![Page 43: Optimization for Tensor Models€¦ · Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor](https://reader034.vdocument.in/reader034/viewer/2022042219/5ec597f5516b3934497279ef/html5/thumbnails/43.jpg)
References
43
[WGY] Wen, Z., Goldfarb, D., & Yin, W. (2010). Alternating direction augmented Lagrangian methods for semidefinite programming. Mathematical Programming Computation, 2(3-4), 203-230. [MHWG] Mu, C., et al. "Square deal: Lower bounds and improved relaxations for tensor recovery." Proceedings of the international conference on Machine learning. ACM, 2014 [GQ] Goldfarb, D., & Qin, Z.. "Robust low-rank tensor recovery: Models and algorithms." SIAM Journal on Matrix Analysis and Applications 35.1 (2014): 225-253. [HMGW] Huang, B., et al. "Provable models for robust low-rank tensor completion." Pacific Journal of Optimization 11 (2015): 339-364. [MHG] Mu, C., Hsu, D., & Goldfarb, D.. "Successive Rank-One Approximations for Nearly Orthogonally Decomposable Symmetric Tensors." SIAM Journal on Matrix Analysis and Applications 36.4 (2015): 1638-1659.