(semi-)nonnegative matrix factorization and k-mean clustering

Post on 08-Feb-2017

251 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

C. Ding, NMF => Unsupervised Clustering 1

(Semi-)Nonnegative Matrix Factorization and

K-mean Clustering

Xiaofeng He Lawrence Berkeley Nat’l LabHorst Simon Lawrence Berkeley Nat’l LabTao Li Florida Int’l Univ.Michael Jordan UC BerkeleyHaesun Park Georgia Tech

Chris DingLawrence Berkeley National Laboratory

C. Ding, NMF => Unsupervised Clustering 2

Nonnegative Matrix Factorization (NMF)

),,,( 21 nxxxX L=Data Matrix: n points in p-dim:

TFGX ≈Decomposition (low-rank approximation)

Nonnegative Matrices0,0,0 ≥≥≥ ijijij GFX

),,,( 21 kgggG L=),,,( 21 kfffF L=

is an image, document, webpage, etc

ix

C. Ding, NMF => Unsupervised Clustering 3

Some historical notes

• Earlier work by statistics people (G. Golub)• P. Paatero (1994) Environmetrices• Lee and Seung (1999, 2000)

– Parts of whole (no cancellation)– A multiplicative update algorithm

C. Ding, NMF => Unsupervised Clustering 4

0 0050 710

080 20 0

.

.

.

.

.

.

.

M

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

Pixel vector

C. Ding, NMF => Unsupervised Clustering 5

XFGT ≈

),,,( 21 kfffF L= ),,,( 21 kgggG L=

Lee and Seung (1999): Parts-based Perspective

original

),,,( 21 nxxxX L=

C. Ding, NMF => Unsupervised Clustering 6

TFGX ≈ ),,,( 21 kfffF L=

Straightforward NMF doesn’t get parts-based picture

Several People explicitly sparsify F to get parts-based picture

Donono & Stodden (2003) study condition for parts-of-whole

(Li, et al, 2001; Hoyer 2003)

“Parts of Whole” Picture

C. Ding, NMF => Unsupervised Clustering 7

Meanwhile …….A number of studies empirically show the usefulness of NMF for pattern discovery/clustering:

Xu et al (SIGIR’03)Brunet et al (PNAS’04)Many others

We claim:

NMF factors give holistic pictures of the data

C. Ding, NMF => Unsupervised Clustering 8

Our Experiments: NMF gives holistic pictures

C. Ding, NMF => Unsupervised Clustering 9

Our Experiments: NMF gives holistic pictures

C. Ding, NMF => Unsupervised Clustering 10

Task:Prove NMF is doing “Data Clustering”

NMF => K-means Clustering

C. Ding, NMF => Unsupervised Clustering 11

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

C. Ding, NMF => Unsupervised Clustering 12

• Also called “isodata”, “vector quantization”• Developed in 1960’s (Lloyd, MacQueen, Hartigan, etc)

K-means clustering

• Computationally Efficient (order-mN)• Most widely used in practice

– Benchmark to evaluate other algorithms

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1

||||min

TnxxxX ),,,( 21 L=Given n points in m-dim:

K-means objective

C. Ding, NMF => Unsupervised Clustering 13

Reformulate K-means Clustering

∑ ∑ ∑=

∈−= i

K

kCji j

Ti

kiK k

xxn

xJ1

,2 1||||

}2/1/)00,11,00( k

T

n

k nhk

LLL=

Cluster membership indicators:

∑ ∑=

−=i

K

kk

TTkiK XhXhxJ

1

2

),,( 1 KhhH L=

)(Trmax0,

XHXH TT

HIHHT ≥=Solving K-mean =>

(Zha, Ding, Gu, He, Simon, NIPS 2001) (Ding & He, ICML 2004)

C. Ding, NMF => Unsupervised Clustering 14

Reformulate K-means Clustering

Cluster membership indicators :

Hhhh ==

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

),,(

1100000

0011100

0000011

321

C1 C2 C3

C. Ding, NMF => Unsupervised Clustering 15

NMF-Kmeans Theorem

2

0||X||min

0,

T

FFG

GIGTG

−≥=

)(Trmin0,

XGXGXX TTT

GIGGT−

≥=

G -orthogonal NMF is equivalent to relaxed K-means clustering.

Proof.

(Ding, He, Simon, SDM 2005)

C. Ding, NMF => Unsupervised Clustering 16

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

C. Ding, NMF => Unsupervised Clustering 17

Symmetric NMF:

Symmetric NMF

Is Equivalence to )(Trmax0,

WHH T

HIHHT ≥=

2

0,||||min T

HIHHHHW

T−

≥=

THHW ≈

Orthogonal symmetric NMF is equivalent to Kernel K-means clustering.

Symmetric Nonnegative matrix

C. Ding, NMF => Unsupervised Clustering 18

Orthogonality in NMF

Strict orthogonal G: hard clustering

Non-orthogonal G: soft clustering),( 21 hhH =

Ambiguous/outlier points

),,,( 21 nxxxX L=

C. Ding, NMF => Unsupervised Clustering 19

K-means Clustering Theorem

2

0,||X||min T

GIGGGF

T +±±≥=−

G -orthogonal NMF is equivalent to relaxed K-means clustering.

(Ding, Li, Jordan, 2006)

Proof requires only G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

C. Ding, NMF => Unsupervised Clustering 20

NMF Generalizations

SVD: TT VUGFX Σ== ±±±

TGFX +±± =Semi-NMF:

Tri-NMF:

Convex-NMF:

Kernel-NMF:

TGWXX ++±± =

TGSFX +±+± =

TGWXX ++±± = )()( φφ

(Ding, Li, Jordan, 2006)

(Ding, Li, Peng, Park, KDD 2006)

C. Ding, NMF => Unsupervised Clustering 21

Semi-NMF:

• For any mixed-sign input data (centered data)• Clustrering and Low-rank approximation

TGFX +±± =

Update F:

Update G:

1)( −= GGXGF T

ikikT

ikikT

ikik FFGXFFFGXFGG

])([)(])([)(

+−

−+

++←

(Ding, Li, Jordan, 2006)

||||min TFGX −

C. Ding, NMF => Unsupervised Clustering 22

In NMF TGFX +++ =TGFX +±± =In Semi-NMF

For fk factor to capture the notion of cluster centroid, Require fk to be a convex combination of input data

Convex-NMF

is in a large space

+=++= XWFxwxwf nnkk ,111 L

TGWXX ++±± =

(Ding, Li, Jordan, 2006)

),,,( 21 kfffF L=

For F interpretability, ±= XWF(Affine combination )

C. Ding, NMF => Unsupervised Clustering 23

Convex-NMF:

||||min TXWGX −

Update F:

Update G:

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

TGWXX ++±± =

Computing algorithm

C. Ding, NMF => Unsupervised Clustering 24

Semi-NMF factors: Convex-NMF factors:

C. Ding, NMF => Unsupervised Clustering 25

Semi-NMF factors: Convex-NMF factors:

C. Ding, NMF => Unsupervised Clustering 26

C. Ding, NMF => Unsupervised Clustering 27

- Sparse factorization is a recent trend.- Sparsity is usually explicitly enforced- Convex-NMF factors are naturally sparse

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

===

000001000001000001000001000001

),,( 1 keeWG L

From this we infer convex NMF factors are naturally sparse

Sparsity of Convex-NMF

2222 ||)(|||||||||| TTkk kXX

TF

T WGIvWGIGXWX T −=−=− ∑ σ

Consider 22 ||)(|||||| TTkk

T WGIeWGI −=− ∑Solution is

C. Ding, NMF => Unsupervised Clustering 28

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

C. Ding, NMF => Unsupervised Clustering 29

Experiments on 7 datasets

NMF variants always perform better than K-means

C. Ding, NMF => Unsupervised Clustering 30

Kernel NMF -- Generalized Convex NMF

Map feature vector to higher-dim space

NMF/semi-NMF

)( ii xx φ→TFGX =)(φ

Minimization objective depends on kernel only:

)()(),()Tr(||)()(|| 2 TTT WGIXXGWIWGXX −−=− φφφφ

(Ding & He, ICML 2004)

)](,),(),([)( 21 nxxxX φφφφ L=

depends on the explicit mapping function )(•φ

TGWXX ])([)( φφ =Kernel NMF:

C. Ding, NMF => Unsupervised Clustering 31

Kernel K-means Clustering

Map feature vector to higher-dim space

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

C. Ding, NMF => Unsupervised Clustering 32

NMF and PLSI : Equivalence

So far we only use the Frobenius norm as the NMF objective function. Another objective is the KL divergence

∑∑∈=

−=kCi

ki

K

kK cxJ 2

1||)()(||min φφφ

Kernel K-means objective:

)( ii xx φ→

∑∈

≡kCi

ik

k xn

c )(1)( φφ

Kernal K-means optimization:

∑ ∑∑= ∈

−=K

k Cjij

Ti

kiiK

k

xxn

xJ1 ,

2 )()(1|)(| φφφφ

)(Tr)(),(1max1 ,

WHHxxn

J TK

k Cjiji

kK

k

== ∑ ∑= ∈

φφφ

Matrix of pairwise similarities

C. Ding, NMF => Unsupervised Clustering 33

ikTT

ikT

ikTT

ikT

ikik GWGXXGXXGWGXXGXXWW

])[(])[(])[(])[(

+−

−+

++←

Kernel-NMF Algorithm

Update F:

Update G:

ikTT

ikTT

ikTT

ikTT

ikik WXXGWXXWWXXGWXXW

GG])([])([])([])([

+−

−+

++

(Ding, Li, Jordan, 2006)

Computing algorithm depends only on the kernel

)(),( XX φφ

C. Ding, NMF => Unsupervised Clustering 34

Orthogonal Nonnegative Tri-Factorization

2

0,||X||min

0,

T

FIFFGSF

GIGTG

T +±+±≥=−

≥=

3-factor NMF with explicit orthogonality constraints

Simultaneous K-means clustering of rows and columns

),,,( 21 kgggG L=

),,,( 21 kfffF L= => Row cluster indicators

=> Column cluster indicators(Ding, Li, Peng, Park, KDD 2006)

1. Solution is unique2. Can’t reduce to NMF

C. Ding, NMF => Unsupervised Clustering 35

NMF-like algorithms are different ways to relax F , G !

),,(,,/ 11

knnkkk nndiagDXGDFnXgf L=== −

IGGGGXXGXGDXJ TTTnK =−=−= − ~~,||~~|||||| 221

2

1

2

1

2

1|||||||||||| T

n

ikiik

K

kCiki

K

kK FGXfxgfxJ

k

−=−=−= ∑∑∑∑==∈=

),,,( 21 kgggG L=

),,,( 21 kfffF L= = cluster centroids

= cluster indicators

),,,( 21 nxxxX L= = input data

K-means clustering objective function

C. Ding, NMF => Unsupervised Clustering 36

NMF PLSINMF objective functions• Frobenius norm• KL-divergence: ij

Tij

n

jFG

xij

m

iKL FGxxJ

ijTij )(log

1)(

1+−= ∑∑

==

),(log),(11

j

n

jiji

m

iPLSI dwpdwxJ ∑∑

==

=

)|()()|(),( kjkk

kiji zdpzpzwpdwp ∑=

Probabilistic LSI (Hoffman, 1999) is a latent variable model for clustering:

constant+−= −KLNMFPLSI JJWe can show(Ding, Li, Peng, AAAI 2006)

C. Ding, NMF => Unsupervised Clustering 37

Summary• NMF is doing K-means clustering (or PLSI)• Interpretability is key to motivate new NMF-

like factorizations– Semi-NMF, Convex-NMF, Kernel-NMF, Tri-NMF

• NMF-like algorithms always outperform K-means clustering

• Advantage: hard/soft clustering• Convex-NMF enforces notion of cluster centroids

and is naturally sparse

NMF: A new/rich paradigm for unsupervised learning

C. Ding, NMF => Unsupervised Clustering 38

References

• On the Equivalence of Nonnegative Matrix Factorization and K-means /Spectral clustering, Chris Ding, XiaofengHe, Horst Simon, SDM 2005.

• Convex and Semi-Nonnegative Matrix Factorization, Chris Ding, Tao Li, Michael Jordan, submitted

• Orthogonal Non-negative Matrix Tri-Factorization for clustering, Chris Ding, Tao Li, Wei Peng, Haesun Park,KDD 2006.

• Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square and a Hybrid Algorithm, Chris Ding, Tao Li, Wei Peng, AAAI 2006.

C. Ding, NMF => Unsupervised Clustering 39

Data Clustering: NMF and PCA

2

0,||X||min T

GIGGGF

T +±±≥=−

NMF is useful due to nonnegativity.

G-orthogonality and nonnegativity

),,,( 21 kgggG L=

),,,( 21 kfffF L= => cluster centroids

=> cluster indicators

What happens if we ignore nonnegativity?

C. Ding, NMF => Unsupervised Clustering 40

K-means clustering PCA

2

0,||))((X||min T

GIGGRGRF

T +±±≥=−

Ignore nonnegativity => orth. transform R

)]()([Trmax GRXXGR TT

GR

GRVFRUVUX T ==Σ= ,,

TTT UUFRFRFF == ))((

Equivelevant to

Solution is given by SVD:

Cluster indicator projection:

Centroid subspace projection:

TTT VVGRGRGG == ))((

PCA/SVD is automatically doing K-means clustering

(Ding & He, ICML 2004)

C. Ding, NMF => Unsupervised Clustering 41

48476 1cluster

xxxxxxxx48476 2cluster

xxxxxxxx

A Simple Example

08.0|||| =− Kmeansconvex CF53.0|||| =− Kmeanssemi CF

30877.0,27944.0,27940.0|||| =− TFGXSVD Semi Convex

C. Ding, NMF => Unsupervised Clustering 42

NMF = Spectral Clustering (Normalized Cut)

Normalized Cut ⇒

cluster indicators:

))~((

)~()~(),,( 111

YWIY

yWIyyWIyyyJT

kTk

Tk

−=

−++−=

TrNcut LL

Re-write:}

||||/)00,11,00( 2/12/1k

Tn

k hDDyk

LLL=

IYYYW T =tosubject :Optimize ),~YTY

Tr(max

2/12/1~ −−= WDDW

2

0,||~||min T

HIHHHHW

T−

≥=

kTk

kTk

T

T

k DhhhWDh

DhhhWDhhhJ

)()(),,(11

111

−++−= LLNcut

(Gu , et al, 2001)

top related