university at buffalothe state university of new york wavecluster a multi-resolution clustering...
Post on 15-Jan-2016
219 views
TRANSCRIPT
![Page 1: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/1.jpg)
University at Buffalo The State University of New York
WaveCluster
A multi-resolution clustering approachApply wavelet transformation to the feature space
Both grid-based and density-basedInput parameters:
Number of grid cells for each dimensionThe waveletThe number of applications of wavelet transform
![Page 2: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/2.jpg)
University at Buffalo The State University of New York
What are Wavelets
![Page 3: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/3.jpg)
University at Buffalo The State University of New York
What Is Wavelet Transform?
Decomposes a signal into different frequency subbandsApplicable to n-dimensional signals
Data are transformed to preserve relative distance between objects at different levels of resolution
Allow natural clusters to become more distinguishable
![Page 4: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/4.jpg)
University at Buffalo The State University of New York
Intuition Behind Using Wavelet Transform
Wavelet transform filters makes clusters more distinct
Effective removal of outliers
Multi-resolution property of wavelet transform can help detecting clusters at different levels of accuracy
Cost-efficiency
![Page 5: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/5.jpg)
University at Buffalo The State University of New York
Wavelet Transformation
![Page 6: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/6.jpg)
University at Buffalo The State University of New York
Why Is Wavelet Transform?
Use hat-shape filtersEmphasize region where points clusterSuppress weaker information in their boundaries
Effective removal of outliers Insensitive to noise, insensitive to input order
Multi-resolutionDetect arbitrary shaped clusters at different scales
EfficientComplexity O(N)
Only applicable to low dimensional data
![Page 7: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/7.jpg)
University at Buffalo The State University of New York
WaveCluster: Method
Summarize the data by imposing a multidimensional grid structure on to data spaceMultidimensional spatial data objects are represented in
an n-dimensional feature spaceApply wavelet transform on feature space to find
the dense regions in the feature spaceApply wavelet transform multiple times
Result in clusters at different scales from fine to coarseBy Dr. Aidong Zhang’s group
![Page 8: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/8.jpg)
University at Buffalo The State University of New York
WaveCluster
![Page 9: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/9.jpg)
University at Buffalo The State University of New York
Shrinking: Intuition & Purpose
For data points in a data set, what if we
could make them move towards the centroid
of the natural subgroup they belong to?
Natural sparse subgroups become denser,
thus easier to be detected; noises are further
isolated.
![Page 10: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/10.jpg)
University at Buffalo The State University of New York
The Concept of Shrinking
A data preprocessing technique
It aims to optimize the inner structure of real data
sets
Each data point is “attracted” by other data points
and moves to the direction in which way the
attraction is the strongest
Can be applied in different fields
![Page 11: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/11.jpg)
University at Buffalo The State University of New York
Data Shrinking
Each data point moves along the direction of the density gradient and the data set shrinks towards the inside of the clusters.
Points are “attracted” by their neighbors and move to create denser clusters.
Proceeds iteratively; repeated until the data are stabilized or the number of iterations exceeds a threshold.
![Page 12: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/12.jpg)
University at Buffalo The State University of New York
Apply shrinking into clustering field
Multi-attribute hyperspace
Shrink the natural sparse clusters to make them much denser to facilitate further cluster-detecting process.
![Page 13: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/13.jpg)
University at Buffalo The State University of New York
Overall Structure
![Page 14: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/14.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
Space subdivisionNormalization of data spaceGiven the side length 1/k of grid cells, the normalized data
space is subdivided into kd cells.Each grid g contains the average position (grid point) and
number of data points in it.Neighboring relationship of points is grid-based.
In each iteration, data points move toward the data centroid of the neighboring grids.
Grid scale:Apply different grid scales, choose best clustering results.
![Page 15: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/15.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
Multi-scale solution: choose multiple grids scales for data shrinking
1.Determination of a proper cell size
2.Advantages for handling clusters of various densities
![Page 16: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/16.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
Acquirement of Multi-scale
A straightforward solution: use a sequence of grids of exponentially increasing cell sizes.
Smin, Smin*Eg, … Smin*(Eg)ŋ = Smax, for some ŋN
Disadvantage:
1) Smin depends on the granularity of data
2) Losing important grid scale candidates
![Page 17: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/17.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
A histogram-based approach to get reasonable grid scalesGet histograms for dimensions: H={h1,h2, …,hd}Density span: a combination of consecutive bins’
segments on a certain dimension in which the amount of data points exceeds a threshold.
Start from the largest bin, get density spans.Regard density spans with similar sizes as
identical ones, and choose those with largest frequencies as grid scale candidates.
![Page 18: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/18.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
An example of density span processing
![Page 19: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/19.jpg)
University at Buffalo The State University of New York
Data Shrinking (Cont’d)
An example of data movement
Solution:Treat the points in each cell as a rigid body which is
pulled as a unit toward the data centroid of the surrounding cells which have more points.
![Page 20: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/20.jpg)
University at Buffalo The State University of New York
ExperimentsOriginal data set
data set after
iteration 1
data set after
iteration 3
data set after
iteration 2
2d example
![Page 21: University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space](https://reader035.vdocument.in/reader035/viewer/2022070412/56649d595503460f94a39e18/html5/thumbnails/21.jpg)
University at Buffalo The State University of New York
Cluster Detection
Neighboring dense cells are connected and a neighboring graph G of the dense cells is constructed.
Use a breadth-first search algorithm to find the components of graph G. Each component is a cluster.
Label data points with cluster ids.