modified global k-means algorithm for minimum sum-of-squares clustering problems

Post on 29-Jan-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Modified global k-means algorithm for minimum sum-of-squares clustering problems. Presenter : Lin, Shu -Han Authors : Adil M. Bagirov. Pattern Recognition (PR, 2008). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. k- Means algorithm - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Modified global k-means algorithm for

minimum sum-of-squares clustering problems

Pattern Recognition (PR, 2008)

Presenter : Lin, Shu-Han

Authors : Adil M. Bagirov

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

k-Means algorithm sensitive to the choice of starting points

inefficient for solving clustering problems in large data sets

Global k-Means (GKM) algorithm incremental algorithm (dynamically adds a cluster center at a time)

uses each data point as a candidate for the k-th cluster center

3

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Propose a new version of GKM

4

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – k-Means

5

sensitive to the choice of a starting point

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – The GKM algorithm

6

Objective function

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Objective function

7

Old version

Reformulated version

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – fast GKM algorithm

8

Old version

Proposed version (auxiliary cluster function)

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – modified GKM algorithm

9

Proposed version

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – modified GKM algorithm

10

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

11

MS k-means: Multi-start k-means GKM: fast Global K-Means MGKM: Modified Global K-Means

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

12

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

13

Overall (14 datasets, 140 results) The MS k-means algorithm finds the best known (or near best known)

solutions 42 (33.3%) times

GKM algorithm 76 (60.3%) times

MGKM algorithm 102 (81.0%) times

Large k in large data sets (m) The MS k-means algorithm failed to find the best known (or near best

known) solutions

GKM algorithm finds such solutions 22 (45.8%) times

MGKM algorithm 42 (87.5%) times.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions

A new version of the GKM Change the computation of starting points

By minimize the auxiliary cluster function

Given tolerance

Is more effective than GKM large dataset especially

The choice of starting points in k-means is crucial

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage Theoretically analysis

Drawback Describe why they think to modify anything they tend to modify is

important, or need to.

Application GKM outperforms k-means algorithm

top related