a dissimilarity measure for the k-modes clustering algorithm

24
A dissimilarity measure for the K-Modes clustering algorithm Presenter : Bo-Sheng Wang Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang KBS, 2012 1

Upload: seth

Post on 15-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Presenter : Bo- Sheng Wang Authors: Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai , Chuangyin Dang KBS, 2012. A dissimilarity measure for the K-Modes clustering algorithm. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A dissimilarity measure for the K-Modes clustering algorithm

A dissimilarity measure for the K-Modes clustering algorithm

Presenter : Bo-Sheng Wang  Authors : Fuyuan Cao, Jiye Liang, Deyu Li, Liang Bai, Chuangyin Dang

KBS, 2012

1

Page 2: A dissimilarity measure for the K-Modes clustering algorithm

Outlines

• Motivation• Objectives• Methodology• Experiments• Conclusions• Comments

2

Page 3: A dissimilarity measure for the K-Modes clustering algorithm

Motivation• In this paper, the limitations of simple matching

dissimilarity measure and Ng’s dissimilarity measure are revealed using some illustrative examples.

3

Page 4: A dissimilarity measure for the K-Modes clustering algorithm

Limitations of simple matching dissimilarity measure

• Simple matching is a common approach, the simple matching dissimilarity measure is is defined as:

• However, simple matching often results :– Weak intrasimilarity.– Disregards the similarity hidden between categorical values.

4

x≡y =1, if x≠y

0, otherwise

Page 5: A dissimilarity measure for the K-Modes clustering algorithm

Limitations of Ng’s dissimilarity measure

• For the k-Modes algorithm with Ng’s dissimilarity measure, the simple matching dissimilarity measure is still used in the first iteration.

– Disregards the similarity hidden between categorical values.

5

Page 6: A dissimilarity measure for the K-Modes clustering algorithm

Objectives• Based on the idea of biological and genetic taxonomy

and rough membership function, a new dissimilarity measure for the k-Modes algorithm is define.

• The dissimilarity measure between a mode of a cluster and an object is given by improving Ng’s dissimilarity measure.

6

Page 7: A dissimilarity measure for the K-Modes clustering algorithm

Methodology• Review some basic concepts of rough set theory.– Definition 1 Categorical information system• IS = (U,A,V,f)

– Definition 2 Binary relation IND(P)• 1.• 2.

– .Definition 3 The rough membership function µPX: U→[0,1]

7

Page 8: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between two objects• Definition 4 A similarity measure between objects x and y with respect to

a–

8

Page 9: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between two objects• Definition 5 The dissimilarity measure between x and y with respect to P.

9

Page 10: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between two objects

• Example : A new dissimilarity measure between two objects– Simple Matching Dissimilarity Measure :

– New Dissimilarity Measure :

10

Page 11: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between a mode and an object• Ng’s Dissimilarity Measure

11

Page 12: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between a mode and an object• Definition 7

The new dissimilarity measure between xi and zl with respect to P

12

Page 13: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-A new dissimilarity measure between a mode and an objects• Example : A new dissimilarity measure between a mode and an object

– Ng’s dissimilarity measure

– New dissimilarity measure

13

Page 14: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis• The objective of clustering a set of n = |U| objects into k

clusters is to find W and Z that minimize:

14

Page 15: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis• This process can be formulated as the following k-

Modes algorithm:

15

Page 16: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis• Now we consider the convergence of the k-Modes algorithm

with the proposed dissimilarity measure NDisP(zl ,x i )

16

Page 17: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis• Proof. For a given W. we have :

17

Page 18: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis

18

Page 19: A dissimilarity measure for the K-Modes clustering algorithm

Methodology-Convergence and complexity analysis

19

Page 20: A dissimilarity measure for the K-Modes clustering algorithm

Experiments• Evaluation on scalability

20

Page 21: A dissimilarity measure for the K-Modes clustering algorithm

Experiments• Evaluation on scalability

21

Page 22: A dissimilarity measure for the K-Modes clustering algorithm

Experiments• Evaluation on clustering efficiency

22

Page 23: A dissimilarity measure for the K-Modes clustering algorithm

Conclusions• The new measure that unifies the dissimilarity measures

between two objects and between an object and a mode as well.

• The k-Modes algorithm using the new dissimilarity measure can be safely and effectively used in case of large data sets.

• The results of experiments using synthetic data sets and five real data sets from UCI show the effectiveness of the new dissimilarity measure.

23

Page 24: A dissimilarity measure for the K-Modes clustering algorithm

Comments

• Advantages– The method that can save some time.

• Applications– Dissimilarity measure

24