data clustering: 50 years beyond k-means

20
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Data Clustering: 50 years beyond K- means Presenter : Jiang-Shan Wang Authors : Anil K. Jain PRL 2010 國國國國國國國國 National Yunlin University of Science and Technology 1

Upload: stacia

Post on 06-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Data Clustering: 50 years beyond K-means. Presenter : Jiang-Shan Wang Authors : Anil K. Jain. 國立雲林科技大學 National Yunlin University of Science and Technology. PRL 2010. Outline. Motivation Objective Data clustering User’s dilemma K-means Extensions of K-means - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Data Clustering: 50 years beyond K-means

Presenter : Jiang-Shan Wang

Authors : Anil K. Jain

PRL 2010

國立雲林科技大學National Yunlin University of Science and Technology

1

Page 2: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation

Objective

Data clustering

User’s dilemma

K-means

Extensions of K-means

Trends in data clustering

Summary

Comments

2

Page 3: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

Providing a brief overview of clustering and point out some of the emerging and useful research directions.

3

Page 4: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

4

Summarizing well known clustering methods, discuss the major challenge and key issues in designing clustering algorithm, and point out some of the emerging and useful research directions.

Page 5: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Data clustering

5

Three main purposes: Underlying structure

Natural classification

Compression

Page 6: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.K-means

Three parameters Number of clusters

Cluster initialization

Distance metrics

6

Page 7: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Extensions of K-means

Fuzzy C-means

Bisecting K-means

X-means

K-medoid

Kernel K-means

7

Page 8: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Representation

8

Page 9: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Purpose of grouping

9

Page 10: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Number of clusters

10

Page 11: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Cluster validity

11

Page 12: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Comparing clustering algorithm

12

Page 13: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Comparing clustering algorithm

13

Page 14: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.User’s dilemma

Admissibility analysis of clustering algorithms

Fisher and vanNess’s criteria Convex

Cluster proportion

Cluster omission

Monotone

Kleinberg’s criteria Scale invariance

Richness

consistency

14

Page 15: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Trends in data clustering

Clustering ensembles

15

Page 16: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Trends in data clustering

Semi-supervised clustering

16

Page 17: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Trends in data clustering

Large-scale clustering

Studies Efficient Nearest Neighbor

Data summarization

Distributed computing

Incremental clustering

Sampling-based methods

17

Page 18: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Trends in data clustering

Multi-way clustering

Heterogeneous data Rank data

Dynamic data

Graph data

Relational data

18

Page 19: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Summary

19

There needs to be a suite of benchmark data.

A tighter integration between clustering algorithms and the application needs.

Optimization problems.

Stability or consistency.

Choose clustering principles according to satisfiability of the stated axioms.

Develop semi-supervised clustering.

Page 20: Data Clustering: 50 years beyond K-means

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

20

Advantage Many figures to understanding.

Drawback …

Application Clustering.