a new initialization method for fuzzy c- means using fuzzy subtractive clustering
DESCRIPTION
A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering. Thanh Le, Tom Altman University of Colorado Denver July 19, 2011. Overview. Introduction Data clustering: approaches and current challenges fzSC - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/1.jpg)
A new initialization method for Fuzzy C-Means usingFuzzy Subtractive Clustering
Thanh Le, Tom AltmanUniversity of Colorado Denver
July 19, 2011
![Page 2: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/2.jpg)
Overview Introduction
Data clustering: approaches and current challenges
fzSC a novel fuzzy subtractive clustering
method for FCM parameter initialization Datasets
artificial and real datasets for testing fzSC Experimental results Discussion
![Page 3: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/3.jpg)
Clustering problem Data points are clustered based on
Similarity Dissimilarity
Clusters are defined by Number of clusters Cluster boundaries & overlaps Compactness within clusters Separation between clusters
![Page 4: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/4.jpg)
Clustering approaches Hierarchical approach Partitioning approach
Hard clustering approach Crisp cluster boundaries Crisp cluster membership
Soft/Fuzzy clustering approach Soft/Fuzzy membership Overlapping cluster boundaries Most appropriate for the real problems
![Page 5: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/5.jpg)
Fuzzy C-Means algorithm The model
Features:Fuzzy membership, soft cluster boundariesEach data point can belong to multiple clusters, more relationship information provided
c
1kki
2
ki
n
1i
c
1k
mki
n..1i,1u
1mmin,vxu)V,U|X(J
![Page 6: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/6.jpg)
Fuzzy C-Means (contd.) Possibility-based model Fuzzy sets to describe clusters Model parameters estimated using an
iteration process Rapid convergence Challenges:
Determining the number of clusters Initializing the partition matrix to avoid local
optima
![Page 7: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/7.jpg)
Methods for partition matrix initialization Based on randomization
Problem: Different randomization methods depend on
different data distributions
Using heuristic algorithms: Particle Swarm Problem:
Slow convergence because of velocity adjustment
Integrated with optimization algorithms Problem:
Still based on other methods of partition matrix initialization
![Page 8: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/8.jpg)
Methods for partition matrix…(contd) using Subtractive Clustering Mountain function; the data density,
, : mountain peak radius Mountain amendment; density adjustment,
, : mountain radius Cluster candidate; the most dense data point
, : threshold to stop the cluster center selection
n
1j
2
xx
i
2
2ji
e)x(M
2
2
2jx
*x
eM)x(M)x(M *j1tjt
*0
*t
M
M
![Page 9: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/9.jpg)
Subtractive Clustering methodThe problems Mountain peak radius?
Remaining density to be selected?
Mountain radius?
OK
NO
OKNO
Computational time: O(n2)
![Page 10: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/10.jpg)
The proposed method: fzSCfor partition matrix initialization
1. Generate a random fuzzy partition2. Compute cluster density using
histogram3. Use strong uniform fuzzy partition
concept4. Estimate mountain function based
on cluster density5. Amend mountain function:
1. Update cluster density (step 2)2. Re-estimate mountain function (step 4)
![Page 11: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/11.jpg)
fzSC:Optimal number of clusters
1. The most dense data point is a cluster candidate
Data density is not much affected, say less than 0.05 of the data density removed by the mountain function amendment process.
The number of such points is less than n
2. , , are not required3. Computational time: O(c*n)
![Page 12: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/12.jpg)
Datasets Artificial datasets
Finite mixture model based datasetsA manually created (MC) dataset
Data were generated using finite mixture modelClusters were moved to have different distances among clusters
Real datasetsIris, Wine, Glass and Breast Cancer Wisconsin datasets at UC Irvine Machine Learning Repository
![Page 13: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/13.jpg)
Visualization of fzSC result on the manually created (MC) dataset
Rectangles- cluster centers of random fuzzy partition, Circles- cluster centers by fzSC
![Page 14: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/14.jpg)
A visualization…
Stars- cluster centers of random fuzzy partition, Circles- cluster centers by fzSCThe utility is available online: http://ouray.ucdenver.edu/~tnle/fzsc/
![Page 15: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/15.jpg)
Experimental results onmanually created dataset
The algorithm performance on the MC dataset
AlgorithmCorrectness ratio by class
Avg. Ratio1 2 3 4 5 6
fzSC 1.00 1.00 1.00 1.00 1.00 1.00 1.00
k-means 0.97 0.87 1.00 1.00 1.00 0.75 0.93
k-medians
0.95 0.82 1.00 1.00 1.00 0.62 0.90
FCM 0.97 1.00 0.95 1.00 1.00 0.96 0.98
![Page 16: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/16.jpg)
Experimental results onartificial datasets
The number of clusters generated in the
dataset
The dataset dimension
2 3 4 5
5 0.97 1.00 1.00 1.00
6 1.00 0.98 0.90 1.00
7 1.00 1.00 1.00 1.00
8 1.00 0.99 0.97 1.00
9 0.87 0.99 1.00 0.96
Correctness ratio in determining cluster number
![Page 17: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/17.jpg)
Experimental results onReal datasets
Dataset# data points
known #clusters
predicted #clusters
ratio
Iris 150 3 3 1.00
Wine 178 3 3 1.00
Glass 214 665
0.950.05
Breast Cancer Wisconsin
699 665
0.650.35
Correctness ratio in determining cluster number
![Page 18: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/18.jpg)
Discussion:The advantages of fzSC Traditional subtractive clustering
, , are not required Computational time O(c*n) vs. O(n2)
Heuristic based approaches Rapid convergence Escape local optima
Probability model based Rapid convergence No assumption of data distribution
![Page 19: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/19.jpg)
Discussion:Future work
Combine fzSC with biological cluster validation methods and optimization algorithms for novel clustering algorithms regarding the gene expression data analysis problem.
![Page 20: A new initialization method for Fuzzy C- Means using Fuzzy Subtractive Clustering](https://reader033.vdocument.in/reader033/viewer/2022052214/568153d8550346895dc1ce5e/html5/thumbnails/20.jpg)
Thank you!
Questions?
We acknowledge the support from Vietnamese Ministry of Education and
Training, the 322 scholarship program.