Transcript
Page 1: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 1

(Semi-)Nonnegative Matrix Factorization and

K-mean Clustering

Xiaofeng He Lawrence Berkeley Nat’l LabHorst Simon Lawrence Berkeley Nat’l LabTao Li Florida Int’l Univ.Michael Jordan UC BerkeleyHaesun Park Georgia Tech

Chris DingLawrence Berkeley National Laboratory

Page 2: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 2

Nonnegative Matrix Factorization (NMF)

),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

TFGX ≈Decomposition (low-rank approximation)

Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX

),,,( 21 kgggG L=),,,( 21 kfffF L=

is an image, document, webpage, etc

ix

Page 3: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 3

Some historical notes

• Earlier work by statistics people (G. Golub)• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)

– Parts of whole (no cancellation)– A multiplicative update algorithm

Page 4: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 4

0 0050 710

080 20 0

.

.

.

.

.

.

.

M

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

Pixel vector

Page 5: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 5

XFGT ≈

),,,( 21 kfffF L= ),,,( 21 kgggG L=

Lee and Seung (1999): Parts-based Perspective

original

),,,( 21 nxxxX L=

Page 6: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 6

TFGX ≈ ),,,( 21 kfffF L=

Straightforward NMF doesn’t get parts-based picture

Several People explicitly sparsify F to get parts-based picture

Donono & Stodden (2003) study condition for parts-of-whole

(Li, et al, 2001; Hoyer 2003)

“Parts of Whole” Picture

Page 7: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 7

Meanwhile …….A number of studies empirically show the usefulness of NMF for pattern discovery/clustering:

Xu et al (SIGIR’03)Brunet et al (PNAS’04)Many others

We claim:

NMF factors give holistic pictures of the data

Page 8: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 8

Our Experiments: NMF gives holistic pictures

Page 9: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 9

Our Experiments: NMF gives holistic pictures

Page 10: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 10

Task:Prove NMF is doing “Data Clustering”

NMF => K-means Clustering

Page 11: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 11

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

Page 12: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 12

• Also called “isodata”, “vector quantization”• Developed in 1960’s (Lloyd, MacQueen, Hartigan, etc)

K-means clustering

• Computationally Efficient (order-mN)• Most widely used in practice

– Benchmark to evaluate other algorithms

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1

||||min

TnxxxX ),,,( 21 L=Given n points in m-dim:

K-means objective

Page 13: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 13

Reformulate K-means Clustering

∑ ∑ ∑=

∈−= i

K

kCji j

Ti

kiK k

xxn

xJ1

,2 1||||

}2/1/)00,11,00( k

T

n

k nhk

LLL=

Cluster membership indicators:

∑ ∑=

−=i

K

kk

TTkiK XhXhxJ

1

2

),,( 1 KhhH L=

)(Trmax0,

XHXH TT

HIHHT ≥=Solving K-mean =>

(Zha, Ding, Gu, He, Simon, NIPS 2001) (Ding & He, ICML 2004)

Page 14: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 14

Reformulate K-means Clustering

Cluster membership indicators :

Hhhh ==

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

),,(

1100000

0011100

0000011

321

C1 C2 C3

Page 15: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 15

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

Page 16: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 16

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 17: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 17

Symmetric NMF:

Symmetric NMF

Is Equivalence to )(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

THHW ≈

Orthogonal symmetric NMF is equivalent to Kernel K-means clustering.

Symmetric Nonnegative matrix

Page 18: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 18

Orthogonality in NMF

Strict orthogonal G: hard clustering

Non-orthogonal G: soft clustering),( 21 hhH =

Ambiguous/outlier points

),,,( 21 nxxxX L=

Page 19: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 19

K-means Clustering Theorem

2

0,||X||min T

GIGGGF

T +±±≥=−

G -orthogonal NMF is equivalent to relaxed K-means clustering.

(Ding, Li, Jordan, 2006)

Proof requires only G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

Page 20: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 20

NMF Generalizations

SVD: TT VUGFX Σ== ±±±

TGFX +±± =Semi-NMF:

Tri-NMF:

Convex-NMF:

Kernel-NMF:

TGWXX ++±± =

TGSFX +±+± =

TGWXX ++±± = )()( φφ

(Ding, Li, Jordan, 2006)

(Ding, Li, Peng, Park, KDD 2006)

Page 21: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 21

Semi-NMF:

• For any mixed-sign input data (centered data)• Clustrering and Low-rank approximation

TGFX +±± =

Update F:

Update G:

1)( −= GGXGF T

ikikT

ikikT

ikik FFGXFFFGXFGG

])([)(])([)(

+−

−+

++←

(Ding, Li, Jordan, 2006)

||||min TFGX −

Page 22: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 22

In NMF TGFX +++ =TGFX +±± =In Semi-NMF

For fk factor to capture the notion of cluster centroid, Require fk to be a convex combination of input data

Convex-NMF

is in a large space

+=++= XWFxwxwf nnkk ,111 L

TGWXX ++±± =

(Ding, Li, Jordan, 2006)

),,,( 21 kfffF L=

For F interpretability, ±= XWF(Affine combination )

Page 23: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 23

Convex-NMF:

||||min TXWGX −

Update F:

Update G:

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

TGWXX ++±± =

Computing algorithm

Page 24: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 24

Semi-NMF factors: Convex-NMF factors:

Page 25: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 25

Semi-NMF factors: Convex-NMF factors:

Page 26: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 26

Page 27: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 27

- Sparse factorization is a recent trend.- Sparsity is usually explicitly enforced- Convex-NMF factors are naturally sparse

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

===

000001000001000001000001000001

),,( 1 keeWG L

From this we infer convex NMF factors are naturally sparse

Sparsity of Convex-NMF

2222 ||)(|||||||||| TTkk kXX

TF

T WGIvWGIGXWX T −=−=− ∑ σ

Consider 22 ||)(|||||| TTkk

T WGIeWGI −=− ∑Solution is

Page 28: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 28

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

Page 29: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 29

Experiments on 7 datasets

NMF variants always perform better than K-means

Page 30: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 30

Kernel NMF -- Generalized Convex NMF

Map feature vector to higher-dim space

NMF/semi-NMF

)( ii xx φ→TFGX =)(φ

Minimization objective depends on kernel only:

)()(),()Tr(||)()(|| 2 TTT WGIXXGWIWGXX −−=− φφφφ

(Ding & He, ICML 2004)

)](,),(),([)( 21 nxxxX φφφφ L=

depends on the explicit mapping function )(•φ

TGWXX ])([)( φφ =Kernel NMF:

Page 31: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 31

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 32: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 32

NMF and PLSI : Equivalence

So far we only use the Frobenius norm as the NMF objective function. Another objective is the KL divergence

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

Page 33: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 33

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

Kernel-NMF Algorithm

Update F:

Update G:

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

Computing algorithm depends only on the kernel

)(),( XX φφ

Page 34: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 34

Orthogonal Nonnegative Tri-Factorization

2

0,||X||min

0,

T

FIFFGSF

GIGTG

T +±+±≥=−

≥=

3-factor NMF with explicit orthogonality constraints

Simultaneous K-means clustering of rows and columns

),,,( 21 kgggG L=

),,,( 21 kfffF L= => Row cluster indicators

=> Column cluster indicators(Ding, Li, Peng, Park, KDD 2006)

1. Solution is unique2. Can’t reduce to NMF

Page 35: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 35

NMF-like algorithms are different ways to relax F , G !

),,(,,/ 11

knnkkk nndiagDXGDFnXgf L=== −

IGGGGXXGXGDXJ TTTnK =−=−= − ~~,||~~|||||| 221

2

1

2

1

2

1|||||||||||| T

n

ikiik

K

kCiki

K

kK FGXfxgfxJ

k

−=−=−= ∑∑∑∑==∈=

),,,( 21 kgggG L=

),,,( 21 kfffF L= = cluster centroids

= cluster indicators

),,,( 21 nxxxX L= = input data

K-means clustering objective function

Page 36: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 36

NMF PLSINMF objective functions• Frobenius norm• KL-divergence: ij

Tij

n

jFG

xij

m

iKL FGxxJ

ijTij )(log

1)(

1+−= ∑∑

==

),(log),(11

j

n

jiji

m

iPLSI dwpdwxJ ∑∑

==

=

)|()()|(),( kjkk

kiji zdpzpzwpdwp ∑=

Probabilistic LSI (Hoffman, 1999) is a latent variable model for clustering:

constant+−= −KLNMFPLSI JJWe can show(Ding, Li, Peng, AAAI 2006)

Page 37: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 37

Summary• NMF is doing K-means clustering (or PLSI)• Interpretability is key to motivate new NMF-

like factorizations– Semi-NMF, Convex-NMF, Kernel-NMF, Tri-NMF

• NMF-like algorithms always outperform K-means clustering

• Advantage: hard/soft clustering• Convex-NMF enforces notion of cluster centroids

and is naturally sparse

NMF: A new/rich paradigm for unsupervised learning

Page 38: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 38

References

• On the Equivalence of Nonnegative Matrix Factorization and K-means /Spectral clustering, Chris Ding, XiaofengHe, Horst Simon, SDM 2005.

• Convex and Semi-Nonnegative Matrix Factorization, Chris Ding, Tao Li, Michael Jordan, submitted

• Orthogonal Non-negative Matrix Tri-Factorization for clustering, Chris Ding, Tao Li, Wei Peng, Haesun Park,KDD 2006.

• Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square and a Hybrid Algorithm, Chris Ding, Tao Li, Wei Peng, AAAI 2006.

Page 39: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 39

Data Clustering: NMF and PCA

2

0,||X||min T

GIGGGF

T +±±≥=−

NMF is useful due to nonnegativity.

G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

What happens if we ignore nonnegativity?

Page 40: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 40

K-means clustering PCA

2

0,||))((X||min T

GIGGRGRF

T +±±≥=−

Ignore nonnegativity => orth. transform R

)]()([Trmax GRXXGR TT

GR

GRVFRUVUX T ==Σ= ,,

TTT UUFRFRFF == ))((

Equivelevant to

Solution is given by SVD:

Cluster indicator projection:

Centroid subspace projection:

TTT VVGRGRGG == ))((

PCA/SVD is automatically doing K-means clustering

(Ding & He, ICML 2004)

Page 41: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 41

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

Page 42: (Semi-)Nonnegative Matrix Factorization and K-mean Clustering

C. Ding, NMF => Unsupervised Clustering 42

NMF = Spectral Clustering (Normalized Cut)

Normalized Cut ⇒

cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:}

||||/)00,11,00( 2/12/1k

Tn

k hDDyk

LLL=

IYYYW T =tosubject :Optimize ),~YTY

Tr(max

2/12/1~ −−= WDDW

2

0,||~||min T

HIHHHHW

T−

≥=

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ

)()(),,(11

111

−++−= LLNcut

(Gu , et al, 2001)


Top Related