a generalized maximum entropy approach to bregman co clustering

26
Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. Modha Source : KDD ’04, August 22-25, 2004, ACM, pp. 509- pp.514 Presenter : Allen Wu 111/06/06 1

Upload: guest00a636

Post on 03-Dec-2014

1.237 views

Category:

Technology


0 download

DESCRIPTION

本篇主要是利用Bregman divergence來定義co-clustering的loss function,藉由minimize loss function的概念,來找到最佳的群。

TRANSCRIPT

Page 1: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Author : Arindam Banerjee, Inderjit Dhillon, Joydeep Ghosh, Srujana Merugu, and Dharmendra S. ModhaSource : KDD ’04, August 22-25, 2004, ACM, pp. 509- pp.514Presenter : Allen Wu

112/04/09

1

Page 2: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Introduction Bregman divergences Bregman co-clustering Algorithm Experiments Conclusion

112/04/09

2

Page 3: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Information-theoretic co-clustering (ITCC) model the co-clustering problem as the joint probability distribution.

We seek a co-clustering of both dimensions such that loss in “Mutual Information”

is minimized given a fixed no. of row & col. Clusters.

)ˆ;ˆ( - );(min,ˆ

YXIYXIYX

112/04/09

3

Page 4: A Generalized Maximum Entropy Approach To Bregman Co Clustering

The loss in mutual information equals

where

Can be shown that q(x,y) is a “maximum entropy” approximation to p(x,y).

)),( || ),((D )ˆ;ˆ( - );( KL yxqyxpYXIYXI

yyxxyypxxpyxpyxq ˆ,ˆ where),ˆ|()ˆ|()ˆ,ˆ(),(

112/04/09

4

Page 5: A Generalized Maximum Entropy Approach To Bregman Co Clustering

0.18 0.18 0.14 0.14 0.18 0.18

0.150.150.150.150.20.2

)ˆ(

)(

)ˆ(

)()ˆ,ˆ()ˆ|()ˆ|()ˆ,ˆ(),(

yp

yp

xp

xpyxpyypxxpyxpyxq

5

0.5 0.5

0.30.30.4

054.05.0

18.0

3.0

15.03.0

112/04/09

Page 6: A Generalized Maximum Entropy Approach To Bregman Co Clustering

6

D(p||q)0.0419

090.0419

090.05696

0.05696

0.03760.04964

1

D(p||q)0.056960.056960.0419

10.0419

10.04964

10.0376

112/04/09

Page 7: A Generalized Maximum Entropy Approach To Bregman Co Clustering

D(p||q)0.0211

80.0211

80.0224

30.04076

50.04893 0.04893

7

D(p||q)0.04813

80.04813

80.04194

20.0229

50.0205

20.0205

2

112/04/09

Page 8: A Generalized Maximum Entropy Approach To Bregman Co Clustering

8

112/04/09

Page 9: A Generalized Maximum Entropy Approach To Bregman Co Clustering

However, the matrix may contain negative entries or a distortion measure other than KL-divergence.

The squared Euclidean distance might be more appropriate.

This paper address the general situation by extending ITCC along three directions. “Nearness” is now measured by any Bregman

divergence. Allow specification of a larger class of constraints. Generalize the maximum entropy approach.

112/04/09

9

Page 10: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

10

Page 11: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

11

Page 12: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

12

Page 13: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

13

Page 14: A Generalized Maximum Entropy Approach To Bregman Co Clustering

The objective function is

k

h xh

hk

x1

2

},...,{ 1

min

112/04/09

14

Page 15: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Let ф be a real-valued strictly convex function defined on the convex set S=dom(ф)R, ф is differentiable on int(S), the interior of

S.

The Bregman divergence dф:S ×int(S)[0,∞) is defined as

)(,)()(),( 2212121 zzzzzzzd

112/04/09

15

Page 16: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

16

Page 17: A Generalized Maximum Entropy Approach To Bregman Co Clustering

I-Divergence Given zR+, let ф(z) = zlog(z).For z1, z2 R+

Squared Euclidean Distance Given z R, let ф(z) =z2. For z1, z2 R,

)()/log(),( 2121121 zzzzzzzd

22121 )(),( zzzzd

112/04/09

17

Page 18: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Bregman information is defined as the expected Bregman divergence to the expectation. Iф(Z)=E[dф(Z,E[Z])]

I-Divergence Given a real non-negative random variable Z, the

Bregman information is Iф(Z)=E[Zlog(Z/E[Z])]

Squared Euclidean Distance Given any real random variable Z, the Bregman

information is Iф(Z)=E[(Z-E[Z])2]

112/04/09

18

Page 19: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Let (X, Y)~p(X, Y) be jointly distributed random variables with X, Y.

p(X, Y) be written the form of the matrix Z

The quality of the co-clustering can be defined as

)(,][,][],[ ,11 vuuvnm

uv yxpzvuzZ

nv

mu vyYuxX 11 ][},{:;][},{:

),( clustering-co by the determineduniquely is Z where

)ˆ,()]ˆ,([1 1

m

u

n

vuvuvuv zzdzZZdE

112/04/09

19

Page 20: A Generalized Maximum Entropy Approach To Bregman Co Clustering

(,) involves four random variables corresponding to the various partitioning of the matrix Z.

We can obtain different matrix approximations based on the statistics of Z corresponding to the non-trivial combinations of }}ˆ{},ˆ{},{},{},ˆ,ˆ{},,ˆ{},ˆ,{{ VUVUVUVUVU

}ˆ,ˆ,,{ VUVU

}ˆ,ˆ,,{ VUVU

112/04/09

20

Page 21: A Generalized Maximum Entropy Approach To Bregman Co Clustering

(Γ) denotes the class of matrix approximation schemes based on (,).

The set of approximations MA(,,C) consists of all Z’Sm×n.

The “best” approximation Z.

}},ˆ{},ˆ,{{ }},{},{},ˆ,ˆ{{

}}ˆ,ˆ{{ }},ˆ{},ˆ{{

43

21

VUVUCVUVUC

VUCVUC

)]',([minargˆ),,('

ZZdEZCMZ A

112/04/09

21

Page 22: A Generalized Maximum Entropy Approach To Bregman Co Clustering

112/04/09

22

Page 23: A Generalized Maximum Entropy Approach To Bregman Co Clustering

We present brief case studies to demonstrate two salient features. Dimensionality reduction Missing value prediction

112/04/09

23

Page 24: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Clustering interleaved with implicit dimensionality reduction

Superior performance as compared to one-sided clustering

112/04/09

24

Page 25: A Generalized Maximum Entropy Approach To Bregman Co Clustering

Assign zero measure for missing elements, co-cluster and use reconstructed matrix for prediction

Implicit discovery of correlated sub-matrices

112/04/09

25

Page 26: A Generalized Maximum Entropy Approach To Bregman Co Clustering

The Bregman divergence as the co-clustering loss function. I-divergence and squared Euclidean distance

Approximation models of various complexities are possible depending on the statistics.

The minimum Bregman information principle as a generalization of the maximum entropy principle.

112/04/09

26