an online learning algorithm for bilinear models · yuanbin wu shiliang sun an online learning...
TRANSCRIPT
![Page 1: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/1.jpg)
An Online Learning Algorithm for Bilinear Models
Yuanbin Wu Shiliang Sun
East China Normal University
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 1 / 27
![Page 2: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/2.jpg)
Introduction
Bilinear modelsOnline learningRegret analysis
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27
![Page 3: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/3.jpg)
Introduction: bilinear models
Linear model for multi-class classification
h(x) = arg maxy∈Y
w⊺φ(x, y)
Matrix form linear model
h(x) = arg maxy∈Y
Tr(W ⊺Φ(x , y))
Bilinear model
h(x) = arg maxy∈Y
α⊺Φ(x, y)β
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 3 / 27
![Page 4: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/4.jpg)
Introduction: bilinear models
Linear model for multi-class classification
h(x) = arg maxy∈Y
w⊺φ(x, y)
Matrix form linear model
h(x) = arg maxy∈Y
Tr(W ⊺Φ(x , y))
Bilinear model
h(x) = arg maxy∈Y
α⊺Φ(x, y)β
Matrix feature
Rank 1 constraint on W
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 3 / 27
![Page 5: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/5.jpg)
Introduction: online learning
Online convex optimization Convexity is violated by rank constraints Ω1 = W |rank(W ) ≤ 1 is not a convex set
The primal dual perspective can help The dual problem is always convex
Gradients for matrix norms Singular value decomposition
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 4 / 27
![Page 6: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/6.jpg)
Introduction: regret analysis
The regret of an online algorithm w.r.t. strategy U
RN (U ) = 1N
N∑t=1
Lt(Wt)−1N
N∑t=1
Lt(U ).
Bound of the Hessian (strongly smoothness)
f (x + y) ≤ f (x) +∇f (x)⊺y + β
2∥y∥2
Can we have similar bounds for rank constrained problems?
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 5 / 27
![Page 7: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/7.jpg)
Outlines
1 Bilinear Model
2 Online Learning Algorithm
3 Regret Analysis
4 Experiments
5 Conclusion
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 6 / 27
![Page 8: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/8.jpg)
Bilinear Model
DefinitionWe define the bilinear model with discriminant function
h(x) = arg maxy∈Y
α⊺Φ(x, y)β
where α ∈ Rm , β ∈ Rn . The model parameter W = αβ⊺ is a rank 1matrix.
Why the bilinear formulation semantic relations among features more compact model
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 7 / 27
![Page 9: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/9.jpg)
Bilinear Model
Example: sequential labelling The linear model:
h(x) = arg maxy∈Y
n∑i=1
w⊺ Φ(x, yi , yi−1)
The bilinear model:
h(x) = arg maxy∈Y
n∑i=1
α⊺[
ζ(x, yi)⊗ ζ(x, yi−1)]
β
Number of parameters from O(n2) to O(n)
… …y0 y1 yi yn-1 yn
x
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 8 / 27
![Page 10: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/10.jpg)
Bilinear Model
Example: sequential labelling The linear model:
h(x) = arg maxy∈Y
n∑i=1
w⊺ Φ(x, yi , yi−1)
[yiyi−1 BB BI BO IB II IO OB OI OO0 0 1 0 0 0 0 0 0
]⇒
[B I OB 0 0 1I 0 0 0O 0 0 0
]=
[B 1I 0O 0
] [B I O0 0 1
].
Φ(x, yi , yi−1) Φ(x, yi , yi−1) ζ1(x, yi) ζ⊺2 (x, yi−1)
The bilinear model:
h(x) = arg maxy∈Y
n∑i=1
α⊺ [
ζ(x, yi) ⊗ ζ(x, yi−1)]
β
Number of parameters from O(n2) to O(n)
… …y0 y1 yi yn-1 yn
x
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 8 / 27
![Page 11: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/11.jpg)
Bilinear Model
Example: sequential labelling The linear model:
h(x) = arg maxy∈Y
n∑i=1
w⊺ Φ(x, yi , yi−1)
The bilinear model:
h(x) = arg maxy∈Y
n∑i=1
α⊺[
ζ(x, yi)⊗ ζ(x, yi−1)
]β
Number of parameters from O(n2) to O(n)
… …y0 y1 yi yn-1 yn
x
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 8 / 27
![Page 12: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/12.jpg)
Bilinear Model
Example: sequential labelling The linear model:
h(x) = arg maxy∈Y
n∑i=1
w⊺ Φ(x, yi , yi−1)
The bilinear model:
h(x) = arg maxy∈Y
n∑i=1
α⊺[
ζ(x, yi)⊗ ζ(x, yi−1)]
β
Number of parameters from O(n2) to O(n)
… …y0 y1 yi yn-1 yn
x
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 8 / 27
![Page 13: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/13.jpg)
Bilinear Model
Example: sequential labelling The linear model:
h(x) = arg maxy∈Y
n∑i=1
w⊺ Φ(x, yi , yi−1)
The bilinear model:
h(x) = arg maxy∈Y
n∑i=1
α⊺[
ζ(x, yi)⊗ ζ(x, yi−1)]
β
Number of parameters from O(n2) to O(n)
… …y0 y1 yi yn-1 yn
x
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 8 / 27
![Page 14: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/14.jpg)
Online Learning Algorithm
Large margin optimization problem
minW =αβ⊺∈Ω1
12∥W ∥2F + C
N∑j=1
[1− ⟨W , ∆Φj⟩]+,
where ∆Φj ≜ Φ(x j , yj)− Φ(x j , h(x j)),Ω1 is the set of rank 1 matrices.
Biconvex problem
minα,β
12∥α∥2 + 1
2∥β∥2 + C
N∑j=1
[1− α⊺∆Φjβ]+,
blockwise coordinate descent degenerated cases: only solve a 0-order model on ζ(x, yi)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 9 / 27
![Page 15: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/15.jpg)
Online Learning Algorithm
Large margin optimization problem
minW =αβ⊺∈Ω1
12∥W ∥2F + C
N∑j=1
[1− ⟨W , ∆Φj⟩]+,
where ∆Φj ≜ Φ(x j , yj)− Φ(x j , h(x j)),Ω1 is the set of rank 1 matrices.
Biconvex problem
minα,β
12∥α∥2 + 1
2∥β∥2 + C
N∑j=1
[1− α⊺∆Φjβ]+,
blockwise coordinate descent degenerated cases: only solve a 0-order model on ζ(x, yi)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 9 / 27
![Page 16: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/16.jpg)
Online Learning Algorithm
Our plan: from the dual mirror descent style updates
Wt−1∇F−−−−→ Θt−1y−ηt∇Lt
Wt∇F∗←−−−− Θt
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 10 / 27
![Page 17: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/17.jpg)
Online Learning Algorithm
Define F1(W ) = 12∥W ∥
2F if W ∈ Ω1, +∞ otherwise.
The dual problem
D(η)=N∑
j=1ηj − max
W ∈Ω1
⟨W ,N∑
j=1ηj∆Φj⟩ − 1
2∥W ∥2F
=
N∑j=1
ηj − F∗1 (ΘN ), ηj ∈ [0, C ].
whereΘN = ΘN−1 + ηN ∆ΦN (gradients of hinge loss, mirror space)
F∗1 (Θ) = max
W ∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F (the Frenchel dual)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 11 / 27
![Page 18: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/18.jpg)
Online Learning Algorithm
Define F1(W ) = 12∥W ∥
2F if W ∈ Ω1, +∞ otherwise.
The dual problem
D(η)=N∑
j=1ηj − max
W ∈Ω1
⟨W ,N∑
j=1ηj∆Φj⟩ − 1
2∥W ∥2F
=
N∑j=1
ηj − F∗1 (ΘN ), ηj ∈ [0, C ].
whereΘN = ΘN−1 + ηN ∆ΦN (gradients of hinge loss, mirror space)
F∗1 (Θ) = max
W ∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F (the Frenchel dual)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 11 / 27
![Page 19: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/19.jpg)
Online Learning Algorithm
Define F1(W ) = 12∥W ∥
2F if W ∈ Ω1, +∞ otherwise.
The dual problem
D(η)=N∑
j=1ηj − max
W ∈Ω1
⟨W ,N∑
j=1ηj∆Φj⟩ − 1
2∥W ∥2F
=
N∑j=1
ηj − F∗1 (ΘN ), ηj ∈ [0, C ].
whereΘN = ΘN−1 + ηN ∆ΦN (gradients of hinge loss, mirror space)
F∗1 (Θ) = max
W ∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F (the Frenchel dual)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 11 / 27
![Page 20: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/20.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 21: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/21.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
Proposition: F∗1 (Θ) = 1
2∥Θ∥22 = 1
2∥Θ∥2s(∞) = 1
2σ1(Θ)2
SVD has property of “the best low rank approximation”
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 22: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/22.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 23: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/23.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 24: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/24.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 25: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/25.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 26: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/26.jpg)
Online Learning Algorithm
The dual problem D(η) =∑N
j=1 ηj − F∗1 (ΘN )
D(η) =∑N
j=1 ηj − 12∥ΘN∥22
ΘN = ΘN−1 + ηN ∆ΦN F∗1 (Θ) = max
W∈Ω1⟨W , Θ⟩ − 1
2∥W ∥2F
A series of dual problems Dt+1(η) =∑t
j=1 ηj − F∗1 (Θt), t = 1, 2, . . . , N
uses Wt−1 = αt−1β⊺t−1 to predict xt , yt = h(xt);
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Wt =∇F∗1 (Θt) = arg max
W∈Ω1⟨W , Θt⟩ −
12∥W ∥2
F
σ1 =σ2= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 12 / 27
![Page 27: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/27.jpg)
Online Learning Algorithm
uses Wt−1 = αt−1β⊺t−1 to predict x t ,
yt = h(x t) = arg maxy∈Y
α⊺t−1∆Φt(x t , y)βt−1
sets the dual variable ηt as
ηt =
0 yt = yt
C yt = yt
updates Wt :
Θt= Θt−1 + ηt∆Φt =p∑
i=1σiuiv⊺i
Wt= σ1u1v⊺1
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 13 / 27
![Page 28: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/28.jpg)
Online Learning Algorithm
Wt = ∇F∗1 (Θt) = σ1u1v⊺1
Full SVD is expensive, only needs the leading singular vectors
Power iteration if σ1(Θ) = σ2(Θ)
α(τ+1) = Θ⊺Θα(τ),α(τ+1)
∥α(τ+1)∥→ u1
β(τ+1) = ΘΘ⊺β(τ),β(τ+1)
∥β(τ+1)∥→ v1
Initial value and normalization⋆ Θt = Θt−1 + ηt∆Φt
⋆ if ∆Φt is “small”, αt is close to αt−1⋆ if ∆Φt is “sparse”, normalization could be efficient
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 14 / 27
![Page 29: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/29.jpg)
Online Learning Algorithm
Wt = ∇F∗1 (Θt) = σ1u1v⊺1
Full SVD is expensive, only needs the leading singular vectorsPower iteration
if σ1(Θ) = σ2(Θ)
α(τ+1) = Θ⊺Θα(τ),α(τ+1)
∥α(τ+1)∥→ u1
β(τ+1) = ΘΘ⊺β(τ),β(τ+1)
∥β(τ+1)∥→ v1
Initial value and normalization⋆ Θt = Θt−1 + ηt∆Φt
⋆ if ∆Φt is “small”, αt is close to αt−1⋆ if ∆Φt is “sparse”, normalization could be efficient
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 14 / 27
![Page 30: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/30.jpg)
Online Learning Algorithm
Wt = ∇F∗1 (Θt) = σ1u1v⊺1
Full SVD is expensive, only needs the leading singular vectorsPower iteration
if σ1(Θ) = σ2(Θ)
α(τ+1) = Θ⊺Θα(τ),α(τ+1)
∥α(τ+1)∥→ u1
β(τ+1) = ΘΘ⊺β(τ),β(τ+1)
∥β(τ+1)∥→ v1
Initial value and normalization⋆ Θt = Θt−1 + ηt∆Φt
⋆ if ∆Φt is “small”, αt is close to αt−1⋆ if ∆Φt is “sparse”, normalization could be efficient
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 14 / 27
![Page 31: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/31.jpg)
Regret Analysis
The regret w.r.t. strategy U
RN (U ) = 1N
N∑t=1
Lt(Wt)−1N
N∑t=1
Lt(U ).
Wt are weights at each roundLt is the hinge loss
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 15 / 27
![Page 32: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/32.jpg)
Regret Analysis
Previous analysis (mirror descent)
Wt−1∇F−−−−→ Θt−1y−ηt∇Lt
Wt∇F∗←−−−− Θt
If Lt is convex and F is strongly convex, then RN (U ) = O( 1√N
)
In bilinear model F1(W ) = 1
2∥W ∥2F if W ∈ Ω1, +∞ otherwise.
not convex F∗∗
1 (W ) = 12∥W ∥
22 = F1
The analysis of mirror descent is not directly applicable
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 16 / 27
![Page 33: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/33.jpg)
Regret Analysis
Previous analysis (mirror descent)
Wt−1∇F−−−−→ Θt−1y−ηt∇Lt
Wt∇F∗←−−−− Θt
F1(W ) = 12∥W ∥
2F, W ∈ Ω1
If Lt is convex and F is strongly convex, then RN (U ) = O( 1√N
)
In bilinear model F1(W ) = 1
2∥W ∥2F if W ∈ Ω1, +∞ otherwise.
not convex F∗∗
1 (W ) = 12∥W ∥
22 = F1
The analysis of mirror descent is not directly applicable
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 16 / 27
![Page 34: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/34.jpg)
Regret Analysis
Lower bound of dual objective + weak dualityBound the increase of the dual objective
∆t= Dt+1(η1, . . . , ηt)−Dt(η1, . . . , ηt−1)
= C − 12∥Θt−1 + C∆Φt∥22 + 1
2∥Θt−1∥22.
By the Taylor expansion:
12∥Θ + E∥22 ≤
12∥Θ∥22 + ⟨∇∥Θ∥2, E⟩+ vec(E)⊺H (Θ)vec(E)
where Θ = Θ + θE , θ ∈ (0, 1)
Bound the Hessian term
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 17 / 27
![Page 35: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/35.jpg)
Regret Analysis
Our result (by bounding the Hessian) If σ1(Θ) = σ2(Θ) > 0,
12∥Θ + E∥2
2 ≤12∥Θ∥2
2 + ⟨∇∥Θ∥2, E⟩+ ∥E∥2F
2l1− σ2
σ1
where [σ1, . . . , σl ] = σ(Θ), Θ = Θ + θE , θ ∈ (0, 1)
Known result on Schatten norm (Ball et al., 1994; Kakade et al., 2012):
Schatten norm: ∥Θ∥s(p) = ∥σ(Θ)∥p, ∥Θ∥s(∞) = ∥Θ∥2 = σ1(Θ) for p ∈ [2,∞], 1
p + 1q = 1,
12∥Θ + E∥2
s(p) ≤12∥Θ∥2
s(p) + ⟨∇∥Θ∥s(p), E⟩+∥E∥2
s(q)
2(q − 1).
The bound is trivial if p =∞.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 18 / 27
![Page 36: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/36.jpg)
Regret Analysis
Our result (by bounding the Hessian) If σ1(Θ) = σ2(Θ) > 0,
12∥Θ + E∥2
2 ≤12∥Θ∥2
2 + ⟨∇∥Θ∥2, E⟩+ ∥E∥2F
2l1− σ2
σ1
where [σ1, . . . , σl ] = σ(Θ), Θ = Θ + θE , θ ∈ (0, 1)
Known result on Schatten norm (Ball et al., 1994; Kakade et al., 2012): Schatten norm: ∥Θ∥s(p) = ∥σ(Θ)∥p, ∥Θ∥s(∞) = ∥Θ∥2 = σ1(Θ)
for p ∈ [2,∞], 1p + 1
q = 1,
12∥Θ + E∥2
s(p) ≤12∥Θ∥2
s(p) + ⟨∇∥Θ∥s(p), E⟩+∥E∥2
s(q)
2(q − 1).
The bound is trivial if p =∞.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 18 / 27
![Page 37: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/37.jpg)
Regret Analysis
Our result (by bounding the Hessian) If σ1(Θ) = σ2(Θ) > 0,
12∥Θ + E∥2
2 ≤12∥Θ∥2
2 + ⟨∇∥Θ∥2, E⟩+ ∥E∥2F
2l1− σ2
σ1
where [σ1, . . . , σl ] = σ(Θ), Θ = Θ + θE , θ ∈ (0, 1)
Known result on Schatten norm (Ball et al., 1994; Kakade et al., 2012): Schatten norm: ∥Θ∥s(p) = ∥σ(Θ)∥p, ∥Θ∥s(∞) = ∥Θ∥2 = σ1(Θ) for p ∈ [2,∞], 1
p + 1q = 1,
12∥Θ + E∥2
s(p) ≤12∥Θ∥2
s(p) + ⟨∇∥Θ∥s(p), E⟩+∥E∥2
s(q)
2(q − 1).
The bound is trivial if p =∞.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 18 / 27
![Page 38: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/38.jpg)
Regret Analysis
Proposition (Regret)Assume for all Θ = Θt−1, E = C∆Φt , the bound of Hessian holds. Then
RN (U ) ≤ 12CN
∥U∥2F + 2lCN
N∑t=1
∥∆Φt∥2F1− σt
2σt
1
.
The role of σt1
σt2
the speed of power iteration the regret bound
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 19 / 27
![Page 39: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/39.jpg)
Regret Analysis
Bound σ1σ2
: margin requirement + “σ1 is uniformly greater than σ2”
PropositionAssume that supj,W ∥∆Φj∥2 ≤ M1, supj,W ∥∆Φj∥k(2) ≤ M2. If M1 > M2
2and ∃W has margin γ w.r.t. ∥ · ∥s(1), where γ ∈ (M2
2 , M1), then
σt2
σt1≤ M2 − γ
γ.
CorollaryThe regret is bounded by
RN (U ) ≤ 12CN
∥U∥2F + 2Cl2M 21
γ
2γ −M2.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 20 / 27
![Page 40: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/40.jpg)
Experiments
Two sequential labelling tasks Chinese words segmentation Text chunking
Baselines Linear model (structured perceptron) Blockwise coordinate descent of the biconvex problem Batch learner (CRF+L2, CRF+L1)
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 21 / 27
![Page 41: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/41.jpg)
Experiments
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
89.7 92.0 92.7 93.2 93.5 93.8 94.0 94.1 94.4 94.4
pku
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
91.5 93.3 94.5 95.1 95.7 95.8 96.1 96.2 96.4 96.5
msr
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
0.4
87.5 89.6 90.7 91.5 92.1 92.5 92.7 93.5 93.8 94.0
cityu
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
88.5 91.1 92.6 93.3 93.8 93.9 93.9 94.0 94.1 94.2
as
bol bcd sp
Figure: Chinese word segmentation.Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 22 / 27
![Page 42: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/42.jpg)
Experiments
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0−0.4
−0.3
−0.2
−0.1
0.0
0.1
0.2
0.3
90.2 91.4 92.2 92.7 92.8 93.0 93.2 93.3 93.4 93.6
Chunking
bol bcd sp
Figure: Text chunking.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 23 / 27
![Page 43: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/43.jpg)
Experiments
Compared with linear models When the training set is small, the advantage of bol is more obvious The model is more compact
Compared with blockwise coordinate descent Prevent attracting by solutions of 0-order model.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 24 / 27
![Page 44: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/44.jpg)
Experiments
0 20 40 60 80 1000.00
0.05
0.10
0.15
0.20
0.25
0.30bol
crf2
crf1
0 20 40 60 80 1000.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040bol
sp
bcd
Figure: Convergence.
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 25 / 27
![Page 45: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/45.jpg)
Conclusion
An online learning algorithm for bilinear modelA second order approximation of the squared spectral normFuture works
rank k constraints roughly, needs to compute the leading k singular vectors
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 26 / 27
![Page 46: An Online Learning Algorithm for Bilinear Models · Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 2 / 27. Introduction: bilinear models Linear model for](https://reader034.vdocument.in/reader034/viewer/2022050608/5fafc35d7133f4469059a88b/html5/thumbnails/46.jpg)
Thanks
Yuanbin Wu Shiliang Sun An Online Learning Algorithm for Bilinear Models 27 / 27