on the k-support and related norms - sorbonne-universite.fr

14
On the k -Support and Related Norms Massimiliano Pontil Department of Computer Science Centre for Computational Statistics and Machine Learning University College London (Joint work with Andrew McDonald and Dimitris Stamos) Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 1 / 14

Upload: others

Post on 05-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the k-Support and Related Norms - sorbonne-universite.fr

On the k-Support and Related Norms

Massimiliano Pontil

Department of Computer ScienceCentre for Computational Statistics and Machine Learning

University College London

(Joint work with Andrew McDonald and Dimitris Stamos)

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 1 / 14

Page 2: On the k-Support and Related Norms - sorbonne-universite.fr

Plan

Problem

Spectral regularization

k-support norm

Box norm

Link to cluster norm

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 2 / 14

Page 3: On the k-Support and Related Norms - sorbonne-universite.fr

Problem

Learn a matrix from a set of linear measurements:

yi = 〈W ∗,Xi 〉+ noisei , i = 1, ..., n

Method

minW∈Rd×m

n∑i=1

(yi − 〈W ,Xi 〉)2 + λΩ(W )

Matrix completion: Xi = erec>

Multitask learning: Xi = erxi>

Regularizer Ω encourages matrix structure

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 3 / 14

Page 4: On the k-Support and Related Norms - sorbonne-universite.fr

Spectral Regularization

minW∈Rd×m

n∑i=1

(yi − 〈W ,Xi 〉)2 + λΩ(W )

Ω favors matrix structure (low rank, low variance, clustering, etc.)

Choose an OI-norm: Ω(W ) ≡ ‖W ‖ = ‖UWV ‖, ∀U,V orthogonal

von Neumann (1937): ‖W ‖ = g(σ(W )), with g is an SG-function

Well studied example is trace norm: g(·) = ‖ · ‖1

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 4 / 14

Page 5: On the k-Support and Related Norms - sorbonne-universite.fr

k-Support Norm [Argyriou et al. 2012]

Special case of group lasso with overlap [Jacob et al., 2009]

‖w‖(k) = inf

∑|J|≤k

‖vJ‖2 :∑|J|≤k

vJ = w , supp(vJ) ⊂ J

Includes the `1-norm (k = 1) and `2-norm (k = d)

Unit ball of ‖ · ‖(k) is the convex hull of card(w) ≤ k , ‖w‖2 ≤ 1

Dual norm: ‖u‖∗,(k) =

√k∑

i=1(|u|↓i )2

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 5 / 14

Page 6: On the k-Support and Related Norms - sorbonne-universite.fr

Spectral k-Support Norm

k-support norm is an SG-function, inducing the OI-norm

‖W ‖(k) := ‖σ(W )‖(k)

Proposition. Unit ball of ‖σ(·)‖(k) is the convex hull of

rank(W ) ≤ k , ‖W ‖F ≤ 1

Includes trace norm (k = 1) and Frobenius norm (k = d)

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 6 / 14

Page 7: On the k-Support and Related Norms - sorbonne-universite.fr

Matrix Completion Experiment

dataset norm test error r k a

ML 100k tr 0.2017 13 - -ρ = 50% en 0.2017 13 - -

ks 0.1990 9 1.87 -box 0.1989 10 2.00 1e-5

ML 1M tr 0.1790 17 - -ρ = 50% en 0.1789 15 - -

ks 0.1782 17 1.80 -box 0.1777 19 2.00 1e-6

Jester1 tr 0.1752 11 - -20 per en 0.1752 11 - -line ks 0.1739 11 6.38 -

box 0.1726 11 6.40 2e-5

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 7 / 14

Page 8: On the k-Support and Related Norms - sorbonne-universite.fr

MTL Experiment

Table: Multitask learning clustering on Lenk dataset, with simple thresholding.

dataset norm test error k a

Lenk fr 3.7869 (0.07) - -8 per task tr 1.9058 (0.04) - -

en 1.8974 (0.04) - -ks 1.8933 (0.04) 1.02 -box 1.8916 (0.04) 1.01 5.5e-3c-fr 1.8667 (0.08) - -c-tr 1.7904 (0.03) - -c-en 1.7896 (0.03) - -c-ks 1.7775 (0.03) 1.89 -c-box 1.7754 (0.03) 1.12 9.5e-3

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 8 / 14

Page 9: On the k-Support and Related Norms - sorbonne-universite.fr

Box Norm

Let Θ ⊆ Rd++, bounded and convex and consider the norm:

‖w‖2Θ = inf

θ∈Θ

d∑i=1

w2i

θi, ‖u‖2

∗,Θ = supθ∈Θ

d∑i=1

θiu2i

Box norm: Θ =a < θi ≤ b,

∑di=1 θi ≤ c

Includes k-support norm for a = 0, b = 1, c = k

Unit ball is the convex hull of

⋃|J|≤k

w ∈ Rd :

∑i∈J

w2i

b+∑i /∈J

w2i

a≤ 1

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 9 / 14

Page 10: On the k-Support and Related Norms - sorbonne-universite.fr

Unit Balls

Figure: Unit balls of the box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.

Figure: Unit balls of the dual box norm in R2 for k = 1, a ∈ 0.01, 0.25, 0.50.

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 10 / 14

Page 11: On the k-Support and Related Norms - sorbonne-universite.fr

Cluster Norm

Box norm is an SG-function, inducing the OI-norm

‖W ‖2Θ = ‖σ(W )‖2

Θ = inf d∑

i=1

σi (W )2

θi: θ ∈ (a, b]d ,

d∑i=1

θi ≤ c

Associated OI-norm has been used to favour task clustering [Jacob et al.

2008]. It can be written as

‖W ‖2Θ = inf

tr(WΣ−1W T ) : aI Σ bI , tr Σ ≤ c

Includes spectral k-support norm for a = 0, b = 1, c = k

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 11 / 14

Page 12: On the k-Support and Related Norms - sorbonne-universite.fr

Interpretation of “a”

Proposition. If c = da + k(b − a), the solution of the regularizationproblem is given by W = V + Z , where

(V , Z ) = argminV ,Z

n∑i=1

(yi − 〈V + Z ,Xi 〉)2 + λ

(1

a‖V ‖2

F +1

b − a‖Z‖2

(k)

)

Parameter ‘a’ balances the relative importance of the two components

Cluster norm is the Moureau envelope of spectral k-support norm:

‖W ‖2Θ = min

Z∈Rd×m

1

a‖W − Z‖2

F +1

b − a‖Z‖2

(k)

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 12 / 14

Page 13: On the k-Support and Related Norms - sorbonne-universite.fr

Computation of the Θ norm

Assume w.l.o.g. w ≥ 0 with non increasing components

‖w‖2Θ = 1

b‖w[1:q]‖22 + 1

c−qb−`a‖w[q+1:d−`]‖21 + 1

a‖w[`+1:d ]‖22,

where q, ` ∈ 0, ..., d are uniquely determined

In particular: ‖w‖(k) = ‖w[1:q]‖22 + 1

k−q‖w[q+1:d ]‖21

where q ∈ 0, ..., k − 1 is determined by |w |↓q ≥ 1k−q

d∑j=q+1

|w |↓j > |w |↓q+1

Computation of norm is O(d log(d))

For k-support improves previous O(kd) method

Efficient optimization using proximal-gradient methods

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 13 / 14

Page 14: On the k-Support and Related Norms - sorbonne-universite.fr

Extensions/Open Problems

Other sets Θ allow for exact prox, e.g. Θ = θ1 ≥ . . . θd > 0.Can give a general characterization?

Online learning / stochastic optimization

Kernel extensions

Massimiliano Pontil (UCL) On the k-Support and Related Norms Sestri Levante, Sept 2014 14 / 14