fast accurate fuzzy clustering through data reduction

32
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Fast accurate fuzzy clustering through data reduction Advisor Dr. Hsu Graduat e Sheng-Hsuan Wang Authors Steven Eschrich, Jingwei Ke, Lawrence O. Hall, Dmitry B. Goldgof Department of Information Management IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 11, NO. 2, APRIL 2003

Upload: adolph

Post on 21-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Fast accurate fuzzy clustering through data reduction. Outline. Motivation Objective Introduction Related Work BRFCM BRFCM Implementation Experiments Conclusion Personal Opinion Review. Motivation. The problem of the clustering. Fuzzy c-mean(FCM). Objective. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Fast accurate fuzzy clustering through data reduction

Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Steven Eschrich, 

Jingwei Ke, Lawrence O. Hall, Dmitry B. Goldgof

Department of Information Management

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 11, NO. 2, APRIL 2003

Page 2: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Introduction Related Work BRFCM BRFCM Implementation Experiments Conclusion Personal Opinion Review

Page 3: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

The problem of the clustering. Fuzzy c-mean(FCM).

Page 4: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

As title “Fast Accurate Fuzzy Clustering Through Data Reduction”.~brFCM.

Be able to reduce the number of distinct patterns which must be clustered without adversely affecting partition quality.

The reduction is done by aggregating similar examples and then using a weighted exemplar in the clustering process.

Page 5: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Introduction

Clustering in images. Some modifications to the fuzzy c-means clustering

algorithm. Two experiment to test speedup and FCM

correspondence results. Infrared images of natural scenes. Magnetic resonance images of the human brain.

Page 6: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Related Work(1/2)

For large data sets, the problem of FCM is significant amounts of CPU times.

The variants of FCM. AFCM. mrFCM. subsampling algorithm.

In this paper, the combination of similar feature vectors is used to speed up FCM.

Page 7: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Related Work (2/2)

Our work on speeding up fuzzy c-means has some connection to vector quantization.

In the sense that our first step can be seen to be a quantization of the data.

Page 8: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM

2rFCM Reducing the precision of the data, in order to speed up the

clustering. The brFCM algorithm consists of two phases:

Data reduction. Fuzzy clustering using FCM.

We attempt to reduce the number of distinct examples to be clustered from n to no, for some no << n.

Page 9: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- Data Reduction:Overview The first step is quantization.

Quantization forces different continuous values into the same quantization level or bin.

The second step is aggregation. Aggregation combines identical feature vectors into a

single, weighted exemplar which representing the quantization bin.ex: the mean value of all full-precision feature vectors.

When both quantization and aggregation are used, significant data reduction can be obtained.

Page 10: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- Example

Page 11: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- Data Reduction:Overview The quantization is an optional step in data reduction.

The brFCM with only aggregation is functionally equivalent to the original FCM.

If data redundancy is significant, the dataset can be represented in a more compact form for clustering.

Page 12: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- brFCM Details Data reduction - > brFCM. In more formal terms

X’ of example vectors representing a reduced-precision view of the dataset X.

There are no such vectors, . Each represents the mean of all full-precision members

in the quantization bin. . representing the number of feature vectors aggregated i

nto .

kxthk 00 nk

nn 0

kwkx

Page 13: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- brFCM Details The cluster centroids are calculated by

The cluster membership values are calculated by

(1) 1 ,

)(

)(

0

0

1

1 ci

uw

XuwV n

k

mikk

n

kk

mikk

i

0

1)1/(2

1

nk1 and ci1 where

(2) ])||||

||||([

mc

j jk

ikik VX

VXu

Page 14: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- brFCM Details Two particular features of this algorithm.

When no quantization occurs and the aggregation step doesn’t reduce the dataset, and for all . The algorithm reduces to FCM.

When the aggregation step is used by itself, the algorithm also reduces to FCM. This formulation can significantly improve the speed of clustering, without a loss of accuracy.

nn 0 1iwi

Page 15: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM- Image Characteristics RGB image consisting of possible

values.(4096 * 4096 pixel image)

Consider quantizing RGB space by r = 2 , this will create a space of size .(512*512 pixel image)

24888 2222

18666 2222

Page 16: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM Implementation

For this work, quantization was implemented via bit-masking and aggregation was done using a hashing scheme.

A. Formula Implementation The cluster centroids in (1). . The membership values in (2). When i = j.

miku )(

(1) 1 ,

)(

)(

0

0

1

1 ci

uw

XuwV n

k

mikk

n

kk

mikk

i

0

1)1/(2

1

nk1 and ci1 where

(2) ])||||

||||([

mc

j jk

ikik VX

VXu

Page 17: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM Implementation

B. Quantization Quantization of a feature space can be done either using fixe

d-size bins or variable-sized bins. The brFCM can be implemented efficiently using fixed-size

bins. A more general approach to quantization can be

function.floor integer theis Xsize,on quantizati theis where

(3)

r

rrq

QQ

XQX

Page 18: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.BRFCM Implementation

C. Aggregation Using Hashing. The function is given by

(5) Collisions ofNumber Expected

Items ofNumber Expected

.a0 range fromchosen randomly aexample. in the features ofnumber thedenotes s

(4) mod)()(

ii

0

m

m

mxaxhs

iiia

Page 19: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

The experiments in two image domains. A set of infrared images. Magnetic resonance images of the normal human brain whi

ch are segmented into gray matter, white matter and cerebro-spinalfluid.

Data reduction. Clustering time. Cluster result.

Page 20: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Infrared Images Our 172 ATR images are 8-bit(256 value) infrared

images of size 398400 pixels. The image were clustered into c=5 clusters. We use two features:intensity and one Laws’ Texture

Energy feature. Table 3 shows the remarkable level of reduction seen

in these images.

Page 21: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Infrared Images

Page 22: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Correspondence With FCM To measure, the cluster correspondence in clustering r

esults with FCM. Consider two partitions of X={x1,x2,…,xn}:

We define the maximal intersection of

The correspondence mapping can then be defined as the mapping of cluster such that , for all cluster in .

},...,2,1|{};,...,2,1|{ 2211 ciCPciCP ii 2211 and PCPC ji

(6) },...,2,1| |max{| 212max

1 cjCCCC jiji 21 PP

21 ji CtoC2

max1

ji CC 1P

Page 23: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Correspondence With FCM

The algorithm for calculating the cluster correspondence. Find correspondence mapping Correspondence rate Corr1 is the sum of all maximal

intersections in the correspondence mapping, divided by number of examples in X.

Repeat for Corr2 (using ). Correspondence rate CR=max(Corr1, Corr2).

. and 1221 PPPP

12 PP

(7)||

|| )( 1

2max

1

1 X

CCCorr

c

iji

Page 24: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Correspondence With FCM

How significant are the brFCM-FCM correspondence rates as r increases?

brFCM generally creates partitions very similar to FCM, given the same centroid initializations for this dataset.

Page 25: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Magnetic Resonance Images The set of MR images consisted of 256*256 12-bit im

ages. Each pixel consisted of three features (T1, T2 and PD). 32 MRI slices.

Each MR image has an associated ground truth.

The images were created by the KNN with k=7, where the training data was chosen by a person who could be labeled a radiology technician.

There are three classes of interest in the magnetic resonance images, cerebro-spinal fluid, gray matter and white matter.

Page 26: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Magnetic Resonance Images

Page 27: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Magnetic Resonance Images 1) Performance Speedups

Page 28: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Magnetic Resonance Images 2)Correspondence With FCM on Ground Truth

Page 29: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments- Discussion The brFCM algorithm generates significant speedup o

ver literal FCM in the infrared image dataset and the MRI dataset.

A trade off exists between the FCM correspondence and speedup, Fig.2.

Page 30: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

Speedups versus the bit reduction. The higher the value of r, the higher speedup, the low

er accurate. This approach to speeding up clustering can be applie

d equally well to hard c-means and EM clustering or the optimization to FCM.

For many image clustering problems, brFCM is a fast alternative to traditional FCM.

Page 31: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Personal Opinion

A trade off between accurate and speedup. Data reduction

Numical data => bit mask. Categorical data => Conceptual hierarchical.

Page 32: Fast accurate fuzzy clustering through data reduction

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Review

Fuzzy C-Mean(FCM) Data Reduction

Quantization Using Bit Mask. Aggregation Using Hashing.

Fuzzy clustering using FCM. Two experiments

Infrared images. Magnetic resonance images of the normal human brain.