regularization strategies and empirical bayesian learning...

25
Regularization Strategies and Empirical Bayesian Learning for MKL Ryota Tomioka 1 , Taiji Suzuki 1 1 Department of Mathematical Informatics, The University of Tokyo 2010-12-11 NIPS2010 Workshop: New Directions in Multiple Kernel Learning Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 1 / 25

Upload: others

Post on 15-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

.

.

. ..

.

.

Regularization Strategies and EmpiricalBayesian Learning for MKL

Ryota Tomioka1, Taiji Suzuki1

1Department of Mathematical Informatics, The University of Tokyo

2010-12-11NIPS2010 Workshop:

New Directions in Multiple Kernel Learning

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 1 / 25

Page 2: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Overview

Our contribution

Relationships between different regularization strategiesIvanov regularization (kernel weights)Tikhonov regularization (kernel weights)(Generalized) block-norm formulation (no kernel weights)

Are they equivalent? — in which way?

Empirical Bayesian learning algorithm for MKLMaximizes the marginalized likelihoodCan be considered as a non-separable regularization on thekernel weights.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 2 / 25

Page 3: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Overview

Learning with a fixed kernel combinationFixed kernel combination kd(x , x ′) =

∑Mm=1 dmkm(x , x ′).

minimizef̄∈H(d),

b∈R

N∑i=1

ℓ(yi , f̄ (xi) + b

)+

C2∥f̄∥2

H(d),

(H(d) is the RKHS corresponding to the combined kernel kd ) isequivalent to learning M functions (f1, . . . , fM) as follows:

minimizef1∈H1,

...,fM∈HM ,b∈R

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

M∑m=1

∥fm∥2Hm

dm(1)

where f̄ (x) =∑M

m=1 fm(x).See Sec. 6 in Aronszajn (1950), Micchelli & Pontil (2005).

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 3 / 25

Page 4: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Ivanov regularizationWe can constrain the size of kernel weights dm by

minimizef1∈H1,...,fM∈HM ,

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

M∑m=1

∥fm∥2Hm

dm, (2)

s.t.M∑

m=1

h(dm) ≤ 1 (h is convex, increasing).

Equivalent to the more common expression:

minimizef∈H(d),

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ (yi , f (xi) + b) +C2∥f∥2

H(d), s.t.M∑

m=1

h(dm) ≤ 1.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 4 / 25

Page 5: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Tikhonov regularizationWe can penalize the size of kernel weights dm by

minimizef1∈H1,...,fM∈HM ,

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

M∑m=1

(∥fm∥2

Hm

dm+ µh(dm)

). (3)

Note that the above is equivalent to

minimizef∈H(d),

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ (yi , f (xi) + b)︸ ︷︷ ︸data-fit

+C2∥f∥2

H(d)︸ ︷︷ ︸f -prior

+Cµ

2

M∑m=1

h(dm)︸ ︷︷ ︸dm-hyper-prior

.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 5 / 25

Page 6: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Are these two formulations equivalent?

.

Previously thought that...

.

.

.

. ..

.

.

Yes. But the choice of the pair (C, µ) is complicated.⇒ In the Tikhonov formulation we have to choose both C and µ!(Kloft et al., 2010)

.

We show that...

.

.

.

. ..

.

.

If you give up the constant 1 in the Ivanov formulation∑Mm=1 h(dm) ≤ 1,

Correspondence via equivalent block-norm formulations.C and µ can be chosen independently.The constant 1 has no meaning.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 6 / 25

Page 7: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Ivanov ⇒ block-norm formulation 1 (known)Let h(dm) = dp

m (ℓp-norm MKL); see Kloft et al. (2010).

minimizef1∈H1,...,fM∈HM ,

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

M∑m=1

∥fm∥2Hm

dm,

s.t.M∑

m=1

dpm ≤ 1.

⇓ Jensen’s inequality

minimizef1∈H1,...,fM∈HM

,binR

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

(∑M

m=1∥fm∥q

Hm

)2/q

.

where q = 2p/(1 + p). Minimum is attained at dm ∝ ∥fm∥2/(1+p)Hm

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 7 / 25

Page 8: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Tikhonov ⇒ block-norm formulation 2 (new)

Let h(dm) = dpm, µ = 1/p (ℓp-norm MKL)

minimizef1∈H1,...,fM∈HM ,

b∈R,d1≥0,...,dM≥0

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C2

M∑m=1

(∥fm∥2

Hm

dm+

dpm

p

).

⇓ Young’s inequality

minimizef1∈H1,...,fM∈HM ,

b∈R

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+Cq

M∑m=1

∥fm∥qHm

.

where q = 2p/(1 + p). Minimum is attained at dm = ∥fm∥2/(1+p)Hm

.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 8 / 25

Page 9: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

The two block norm formulations are equivalent

Block norm formulation 1 (from Ivanov):

minimizef1∈H1,...,fM∈HM

,binR

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+C̃2

(∑M

m=1∥fm∥q

Hm

)2/q

.

Block norm formulation 2 (from Tikhonov):

minimizef1∈H1,...,fM∈HM ,

b∈R

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+Cq

M∑m=1

∥fm∥qHm

.

Just have to map C and C̃.The implied kernel weights are normalized/unnormalized.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 9 / 25

Page 10: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Generalized block-norm formulation

minimizef1∈H1,

...,fM∈HM ,b∈R

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+ CM∑

m=1

g(∥fm∥2Hm

), (4)

where g is a concave block-norm-based regularizer.

Example (Elastic-net MKL): g(x) = (1 − λ)√

x + λ2 x ,

minimizef1∈H1,

...,fM∈HM ,b∈R

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi) + b)

+ CM∑

m=1

((1 − λ)∥fm∥Hm +

λ

2∥fm∥2

Hm

),

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 10 / 25

Page 11: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Generalized block-norm ⇒ Tikhonov regularization

.

Theorem

.

.

.

. ..

.

.

Correspondence between the convex (kernel-weight-based)regularizer h(dm) and the concave (block-norm-based) regularizerg(x) is given as follows:

µh(dm) = −2g∗(

12dm

),

where g∗ is the concave conjugate of g.

Proof: Use the concavity of g as

∥fm∥2Hm

2dm≥ g(∥fm∥2

Hm) + g∗(1/(2dm)). ¤

See also Palmer et al. (2006).Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 11 / 25

Page 12: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

ExamplesGeneralized Young’s inequality:

xy ≥ g(x) + g∗(y)

where g is concave, and g∗ is the concave conjugate of g.

Example 1: let g(x) =√

x , then g∗(y) = −1/(4y) and

∥fm∥2Hm

2dm+

dm

2≥ ∥fm∥Hm (L1-MKL).

Example 2: let g(x) = xq/2/q (1 ≤ q ≤ 2), theng∗(y) = q−2

2q (2y)q/(q−2)

∥fm∥2Hm

2dm+

dpm

2p≥ 1

q∥fm∥q

Hm(ℓp-norm MKL),

where p := q/(2 − q).Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 12 / 25

Page 13: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Regularization Strategies

Correspondence

block-norm kern weight reg constMKL model g(x) h(dm) µ

block 1-norm MKL√

x dm 1ℓp-norm MKL 1+p

2p xp/(1+p) dpm 1/p

Uniform-weight MKL x/2 I[0,1](dm) +0(block 2-norm MKL)block q-norm MKL 1

q xq/2 d−q/(q−2)m −(q − 2)/q(q > 2)

Elastic-net MKL (1 − λ)√

x + λ2 x (1−λ)dm

1−λdm1 − λ

I[0,1](x) is the indicator function of the closed interval [0, 1]; i.e.,I[0,1](x) = 0 if x ∈ [0, 1], and +∞ otherwise.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 13 / 25

Page 14: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Empirical Bayesian MKL

Bayesian viewTikhonov regularization as a hierarchical MAP estimation

minimizef1∈H1,

...,fM∈HM ,d1≥0,

...,dM≥0

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi))

︸ ︷︷ ︸likelihood

+M∑

m=1

∥fm∥2Hm

2dm︸ ︷︷ ︸fm-prior

+ µM∑

m=1

h(dm)︸ ︷︷ ︸dm-hyper-prior

.

Hyper prior over the kernel weights

dm ∼ 1Z1(µ)

exp(−µh(dm)) (m = 1, . . . , M).

Gaussian process for the functionsfm ∼ GP(fm; 0, dmkm) (m = 1, . . . , M).

Likelihood

yi ∼1

Z2(xi)exp(−ℓ(yi ,

∑Mm=1 fm(xi))).

� � � � � �� �� �

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 14 / 25

Page 15: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Empirical Bayesian MKL

Marginalized likelihoodAssume Gaussian likelihood

ℓ(y , z) =1

2σ2y(y − z)2.

The marginalized likelihood (omitting hyper-prior for simplicity)

− log p(y |d)

=1

2σ2y

∥∥∥∥y −∑M

m=1f MAPm

∥∥∥∥2

︸ ︷︷ ︸likelihood

+12

M∑m=1

∥f MAPm ∥2

Hm

dm︸ ︷︷ ︸fm-prior

+12

log∣∣K̄ (d)

∣∣︸ ︷︷ ︸

volume-basedregularization

.

f MAPm : MAP estimate for a fixed kernel weights dm

(m = 1, . . . , M).K̄ (d) := σ2

y IN +∑M

m=1 dmK m.See also Wipf & Nagarajan (2009).

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 15 / 25

Page 16: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Empirical Bayesian MKL

Comparing MAP and empirical Bayes objectives

Hyper-prior MAP (MKL):

N∑i=1

ℓ(

yi ,∑M

m=1 fm(xi))

︸ ︷︷ ︸likelihood

+12

M∑m=1

∥fm∥2Hm

dm︸ ︷︷ ︸fm-prior

+ µM∑

m=1

h(dm)︸ ︷︷ ︸dm-hyper-prior

(separable)

.

Empirical Bayes:

12σ2

y

∥∥∥∥y −∑M

m=1f MAPm

∥∥∥∥2

︸ ︷︷ ︸likelihood

+12

M∑m=1

∥f MAPm ∥2

Hm

dm︸ ︷︷ ︸fm-prior

+12

log∣∣K̄ (d)

∣∣︸ ︷︷ ︸

volume-basedregularization

(non-separable)

.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 16 / 25

Page 17: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Experiments

Caltech 101 dataset (classification)

0 10 20 30 40 500.5

0.6

0.7

0.8

0.9

1

Number of samples per class

Acc

urac

y

Cannon vs Cup

MKL (logit)

Uniform

MKL (square)

ElasticnetMKL (λ=0.5)

BayesMKL

Regularization constant C chosen by 2×4-fold crossvalidation on the training-set.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 17 / 25

Page 18: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Experiments

Caltech 101 dataset: kernel weights

1,760 kernel functions.4 SIFT features(hsvsift, sift, sift4px,sift8px)22 spacialdecompositions(including spatialpyramid kernel)2 kernel functions(Gaussian and χ2)10 kernel parameters

0 500 1000 1500 2000BayesMKLacc=0.82

ElasticnetMKL(\lambda=0.5)acc=0.97

MKL (square)acc=0.80

Uniformacc=0.92

MKL (logit)acc=0.82

acc=[0.82 0.92 0.80 0.97 0.82]

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 18 / 25

Page 19: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Experiments

Caltech 101 dataset: kernel weights (detail)

200 300 400 500 600 700BayesMKL

ElasticnetMKL

MKL (square)

Uniform

MKL (logit)

[8.166667e−01 9.166667e−01 8.000000e−01 9.666667e−01 8.166667e−01]

chi2−kernel Gaussian kernel chi2

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 19 / 25

Page 20: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Summary

SummaryTwo regularized kernel weight learning formulations

Ivanov regularization.Tikhonov regularization.

are equivalent. No additional tuning parameter!Both formulations reduce to block-norm formulations viaJensen’s inequality / (generalized) Young’s inequality.Probabilistic view of MKL: hierarchical Gaussian processmodel.Elastic-net MKL performs similarly to uniform weight MKL,but shows grouping of mutually depended kernels.Empirical-Bayes MKL and L1-MKL seem to make thesolution overly sparse, but often they choose slightlydifferent set of kernels.Code for Elastic-net-MKL available from

http://www.simplex.t.u-tokyo.ac.jp/˜s-taiji/software/SpicyMKL

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 20 / 25

Page 21: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Stuffs

Acknowledgements

We would like to thank Hisashi Kashima and Shinichi Nakajimafor helpful discussions. This work was supported in part byMEXT KAKENHI 22700138, 22700289, and NTTCommunication Science Laboratories.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 21 / 25

Page 22: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Stuffs

A brief proof

Minimize the Lagrangian:

minf1∈H1,

...,fM∈HM

12

M∑m=1

∥fm∥2Hm

dm+

⟨g, f̄ −

∑M

m=1fm︸ ︷︷ ︸

equality const.

⟩H(d)

,

where g ∈ H(d) is a Lagrangian multiplier.Fréchet derivative⟨hm,

fmdm

− ⟨g, km⟩H(d)

⟩Hm

= 0 ⇒ fm(x) = ⟨g, dmkm(·, x)⟩H(d) .

Maximize the dual

maxg∈H(d)

−12∥g∥2

H(d) +⟨g, f̄

⟩H(d)

=12∥f̄∥2

H(d)

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 22 / 25

Page 23: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Stuffs

References

Aronszajn. Theory of Reproducing Kernels. TAMS, 1950.

Lanckriet et al. Learning the Kernel Matrix with Semidefinite Programming. JMLR, 2004.

Bach et al. Multiple kernel learning, conic duality, and the SMO algorithm. ICML 2004.

Micchelli & Pontil. Learning the kernel function via regularization. JMLR, 2005.

Cortes. Can learning kernels help performance? ICML, 2009.

Cortes et al. Generalization Bounds for Learning Kernels. ICML, 2010.

Kloft et al. Efficient and accurate lp-norm multiple kernel learning. NIPS 22, 2010.

Tomioka & Suzuki. Sparsity-accuracy trade-off in MKL. arxiv, 2010.

Varma & Babu. More Generality in Efficient Multiple Kernel Learning. ICML, 2009.

Gehler & Nowozin. Let the kernel figure it out; principled learning of pre-processing forkernel classifiers. CVPR, 2009.

Tipping. Sparse bayesian learning and the relevance vector machine. JMLR, 2001.

Palmer et al. Variational EM Algorithms for Non-Gaussian Latent Variable Models. NIPS,2006.

Wipf & Nagarajan. A new view of automatic relevance determination. NIPS, 2008.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 23 / 25

Page 24: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Stuffs

Method A: upper-bounding the log det termUse the upper bound

log∣∣K̄ (d)

∣∣ ≤ ∑M

m=1zmdm − ψ∗(z)

Eliminate the kernels weights by explicit minimization (AGMineq.)

Update fm as

(f m)Mm=1 ← argmin

(f m)Mm=1

(1

2σ2y

∥∥∥∥y −∑M

m=1f m

∥∥∥∥2

+∑M

m=1

√zm ∥f m∥K m

),

Update zm as (tighten the upper bound)

zm ← Tr((σ2

y IN +∑M

m=1 dmK m)−1K m

),

where dm = ∥fm∥Hm/√

zm.Each update step is a reweighted L1-MKL problem.Each update step minimizes an upper bound of thenegative log of the marginalized likelihood.

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 24 / 25

Page 25: Regularization Strategies and Empirical Bayesian Learning ...ttic.uchicago.edu/~ryotat/talks/tomioka-mklworkshop10.pdfBayesian Learning for MKL Ryota Tomioka1, Taiji Suzuki1 1Department

. . . . . .

Stuffs

Method B: MacKay updateUse the fixed point condition for the update of the weights:

−∥f FKL

m ∥2K m

d2m

+ Tr((σ2IN +

∑Mm=1 dmK m)−1K m

)= 0.

Update fm as

(f m)Mm=1 ← argmin

(fm)Mm=1

(1

2σ2y

∥∥∥∥y −∑M

m=1f m

∥∥∥∥2

+12

∑M

m=1

∥f m∥2K m

dm

)Update the kernel weights dm as

dm ←∥f m∥2

K m

Tr((σ2IN +

∑Mm=1 dmK m)−1dmK m

) .

Each update step is a fixed kernel weight leraning problem(easy).Convergence empirically OK (e.g., RVM).

Ryota Tomioka (Univ Tokyo) Generalized MKL 2010-12-11 25 / 25