a novel clustering algorithm based on weighted support and its application
DESCRIPTION
A novel clustering algorithm based on weighted support and its application. Author : Xiang-Rong Yang Jun-Yi Shen Qlang Liu Graduate : Chien-Ming Hsiao. Outline. Motivation Objective Introduction Description of some Terms Algorithm and Analysis - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/1.jpg)
A novel clustering algorithm based on weighted support and its application
Author : Xiang-Rong Yang Jun-Yi Shen
Qlang Liu Graduate : Chien-Ming Hsiao
![Page 2: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/2.jpg)
Outline
Motivation Objective Introduction Description of some Terms Algorithm and Analysis Experimental results Conclusions Personal opinion
![Page 3: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/3.jpg)
Motivation
Many efficient clustering algorithms have been proposed but most of these works focus on numerical data.
![Page 4: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/4.jpg)
Objective
To present a novel and efficient algorithm WeiSC for clustering categorical data
![Page 5: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/5.jpg)
Introduction
Clustering is an important KDD problem. Objective : to group data into sets
Intra-cluster similarity is maximized Inter-cluster similarity is minimized
Most of these works focus on numerical data whose inherent geometric properties can be exploited naturally to define distance functions between data points.
![Page 6: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/6.jpg)
Introduction
The basic idea of WeiSC It repeatedly read tuples from dataset one by one When the first tuple arrives, it forms a cluster alone The consequent tuples are either put into existing cluster or rejecte
d by all existing clusters to form a new cluser by given similarity function defined between tuple and cluser.
Only makes one scan over the dataset
![Page 7: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/7.jpg)
Description of some Terms
m1
im21
DD domains with attributes lcategorica
ofset a is A where tuples,ofset a be A ,,A ,A DLet
eevery tupl of ID unique ofset thebe TIDLet
i
i
A tid, valas drepresente is
tupleingcorrespond of A attributefor value theTID, each tidFor
![Page 8: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/8.jpg)
Description of some Terms
DEFINITION 1
DEFINITION 2
DEFINITION 3
TID ofsubset is TID} tid| {tid Cluster
C tid A tid,val CVAL : as defined is C repect towith
Aon valuesattribute ofset theC,cluster aGiven
ii
i
SUM_CONTACONTAWEI
is A attribute of weight the,ACONTASUM_CONT
,A of valueattributedistinct ofcount thei.e. ,DACONTLet
ii
imi
iii
![Page 9: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/9.jpg)
Description of some Terms
DEFINITION 4
DEFINITION 5
iiiii
iii
atid.A tidAWEIa wei_sp: as definded is A repect to
with Cin a ofsupport weighted the,D alet C,cluster aGiven
C tidatid.Av a wei_sp,aCont ,aVS where
mi1VS CID,Summary : as defined is Cfor summary theC,cluster a Give
iiiiii
i
![Page 10: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/10.jpg)
Algorithm and Analysis
Overview Initially, the first tuple in the database is read and a cluster is con
structed. Then the consequent tuples are read iteratively.
The similarity between the new tuple and each existed clusters is computed according to
The similarity must be above the threshold, denoted as σ When computing the similarity, we use the clusters’ summary instea
d of the clusters themselves, since the information needed contained in clusters’ summary
Ccluster in tuplesofcount theis where, _
1 , 1 CC
aspweitidCsim
m
ii
![Page 11: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/11.jpg)
![Page 12: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/12.jpg)
Computational complexities
The time and space complexities of the WeiSC algorithm depend on
The size of dataset (|D|) The number of attributes (m) The number of the clusters (p) , f (σ) The size of each cluster, g (σ)
Time complexity O(|D| * m * f (σ)) Space complexity O(|D| + m * f (σ) * g (σ))
![Page 13: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/13.jpg)
Experimental results
The experimental results on the performance of WeiSC
Compare the clustering result with ROCK’s on the same data set
![Page 14: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/14.jpg)
Quality of clustering results with real-life datasets
Mushroom dataset (real-life) get from the UCI machine learning Corresponding to 23 species of gilled mushrooms
Each species is identified as definitely edible, definitely poisonous
Has 21 attributes with 8124 tuples The number of edible is 4208 The number of poisonous is 3916
![Page 15: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/15.jpg)
![Page 16: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/16.jpg)
The effect of σ
The parameter of σ Is the only parameter needed in WeiSC algorithm Effects the results of clustering and the speed of algorit
hm
Can use the percentage of misclassified tuples as measure of the effect Since the “edible” or “poisonous” has been labeled in e
ach tuple
![Page 17: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/17.jpg)
![Page 18: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/18.jpg)
![Page 19: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/19.jpg)
Conclusions
The WeiSC algorithm is robust and efficient From inference and experimental Read dataset only once
Used in IDS Is speedy and deserves good efficiency
![Page 20: A novel clustering algorithm based on weighted support and its application](https://reader036.vdocument.in/reader036/viewer/2022062517/56813698550346895d9e2850/html5/thumbnails/20.jpg)
Personal Opinion
We can compare WeiSC algorithm with our algorithm.