cluster analysis
DESCRIPTION
AACIMP 2011 Summer School. Operational Research Stream. Lecture by Erik Kropat.TRANSCRIPT
![Page 1: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/1.jpg)
Cluster Analysis
Summer School
“Achievements and Applications of Contemporary Informatics,
Mathematics and Physics” (AACIMP 2011)
August 8-20, 2011, Kiev, Ukraine
Erik Kropat
University of the Bundeswehr Munich Institute for Theoretical Computer Science,
Mathematics and Operations Research
Neubiberg, Germany
![Page 2: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/2.jpg)
The Knowledge Discovery Process
![Page 3: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/3.jpg)
PRE-
PROCESSING
DATA MINING
PATTERN EVALUATION
RawData
Preprocessed Data
Patterns
Knowledge
Standardizing Missing values / outliers
Strategic planning
Patterns, clusters, correlations automated classification outlier / anomaly detection association rule learning…
![Page 4: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/4.jpg)
Clustering
![Page 5: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/5.jpg)
Clustering
… is a tool for data analysis, which solves classification problems.
Problem
Given n observations, split them into K similar groups.
Question
How can we define “similarity” ?
![Page 6: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/6.jpg)
Similarity
A cluster is a set of entities which are alike,
and entities from different clusters are not alike.
![Page 7: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/7.jpg)
Distance
A cluster is an aggregation of points such that
the distance between any two points in the cluster
is less than
the distance between any point in the cluster and any point not in it.
![Page 8: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/8.jpg)
Density
Clusters may be described as
connected regions of a multidimensional space containing a relatively high density of points,
separated from other such regions by a region containing a relatively low density of points.
![Page 9: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/9.jpg)
Min Max-Problem
Homogeneity: Objects within the same cluster should be similar to each other.
Separation: Objects in different clusters should be dissimilar from each other.
similarity ⇔ distance
Distance between clusters Distance between
objects
![Page 10: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/10.jpg)
Types of Clustering
Clustering
Hierarchical Clustering
Partitional Clustering
agglomerative divisive
![Page 11: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/11.jpg)
Similarity and Distance
![Page 12: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/12.jpg)
Distance Measures
A metric on a set G is a function d: G x G → R+ that satisfies the following conditions:
(D1) d(x, y) = 0 ⇔ x = y (identity) (D2) d(x, y) = d(y, x) ≥ 0 for all x, y ∈ G (symmetry & non-negativity) (D3) d(x, y) ≤ d(x, z) + d(z, y) for all x, y, z ∈ G (triangle inequality)
x
y
z
![Page 13: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/13.jpg)
Examples Minkowski-Distance
o r = 1: Manhatten distance
o r = 2: Euklidian distance
Σ i = 1
_
, r ∈ [1, ∞) n
1 r
, x, y ∈ Rn. d r (x, y) = | xi − yi | r
![Page 14: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/14.jpg)
Euclidean Distance
d2 (x, y) = , x, y ∈ Rn
x = (1, 1)
y = (4, 3)
d2 (x, y) = (1 - 4) + (1 - 3) = √13
x
y
Σ i = 1
_ n 1
2 ( xi − yi ) 2
2 2 _ 1 2 ____
![Page 15: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/15.jpg)
Manhatten Distance
d1 (x, y) = , x, y ∈ Rn
x = (1, 1)
y = (4, 3)
d1 (x, y) = 1 - 4 + 1 - 3 = 3 + 2 = 5
Σ i = 1
n | xi − yi |
x
y
| | | |
![Page 16: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/16.jpg)
Maximum Distance
d∞ (x, y) = max | xi − yi | 1 ≤ i ≤ n
, x, y ∈ Rn
x = (1, 1)
y = (4, 3)
d∞ (x, y) = max (3, 2) = 3
x
y
![Page 17: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/17.jpg)
Similarity Measures
A similarity function on a set G is a function S: G x G → R that satisfies the following conditions:
(S1) S (x, y) ≥ 0 for all x, y ∈ G (positive defined) (S2) S (x, y) ≤ S (x, x) for all x, y ∈ G (auto-similarity) (S3) S (x, y) = S (x, x) ⇔ x = y for all x, y ∈ G (identity) The value of the similarity function is greater when two points are closer.
![Page 18: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/18.jpg)
Similarity Measures
• There are many different definitions of similarity. • Often used
(S4) S (x, y) = S (y, x) for all x, y ∈ G (symmetry)
![Page 19: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/19.jpg)
Hierachical Clustering
![Page 20: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/20.jpg)
Dendrogram
www.isa.uni-stuttgart.de/lehre/SAHBD
Gross national product of EU countries – agriculture (1993)
Eucl
idea
ndi
stan
ce (
com
plet
elin
kage
)Eu
clid
ean
dist
ance
(co
mpl
ete
linka
ge)
Eucl
idea
n di
stan
ce
(com
plet
e lin
kage
)
Cluster Dendrogram
![Page 21: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/21.jpg)
Hierarchical Clustering
Hierarchical clustering creates a hierarchy of clusters of the set G.
Agglomerative clustering: Clusters are successively merged together
Divisive clustering: Clusters are recursively split
Hierarchical Clustering
agglomerative divisive
![Page 22: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/22.jpg)
Agglomerative Clustering
e3 e4 Step 0
Step 1
Step 2
, e3 , e4 Step 3
4 clusters
3 clusters
2 clusters
1 cluster e1 , e2
Merge clusters with smallest distance between the two clusters
e4
e1 , e2 , e3 e4
e3 e1 , e2
e1 e2
![Page 23: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/23.jpg)
Divisive Clustering
e4 Step 3
Step 2
Step 1
, e3 , e4 Step 0
4 clusters
3 clusters
2 clusters
1 cluster e1 , e2
Chose a cluster, that optimally splits in two particular clusters according to a given criterion.
e3
e1 e2
e1 , e2
e1 , e2 e3 , e4
e4
e3
![Page 24: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/24.jpg)
Agglomerative Clustering
![Page 25: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/25.jpg)
INPUT
Given n objects G = { e1,...,en }
represented by p-dimensional feature vectors x1,...,xn ∈ Rp Object
Feat
ure
1
Feat
ure
2
Feat
ure
3
Feat
ure
p
x1 = ( x11 x12 x13 . . . x1p )
x2 = ( x21 x22 x23 . . . x2p )
⁞ ⁞ ⁞ ⁞ ⁞
xn = ( xn1 xn2 xn3 . . . xnp )
![Page 26: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/26.jpg)
Example I
An online shop collects data from its customers. For each of the n customers it exists a p-dimensional feature vector
Object
![Page 27: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/27.jpg)
Example II
In a clinical trial laboratory values of a large number of patients are gathered. For each of the n patients it exists a p-dimensional feature vector
![Page 28: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/28.jpg)
Agglomerative Algorithms
• Begin with disjoint clustering C1 = { {e1}, {e2}, ... , {en} } • Terminate when all objects are in one cluster Cn = { {e1, e2, ... , en} } • Iterate find the most similar pair of clusters
and merge them into a single cluster.
Sequence of clusterings (Ci )i=1,...n of G with
C i 1 ⊂ C i for i = 2,...,n.
e1 e2 e3 e4
![Page 29: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/29.jpg)
What is the distance between two clusters?
d (A,B) A B
⇒ Various hierarchical clustering algorithms
![Page 30: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/30.jpg)
Agglomerative Hierarchical Clustering
There exist many metrics to measure the distance between clusters. They lead to particular agglomerative clustering methods: • Single-Linkage Clustering
• Complete-Linkage Clustering
• Average Linkage Clustering
• Centroid Method
• . . .
![Page 31: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/31.jpg)
Single-Linkage Clustering
Nearest-Neighbor-Method
The distance between the clusters A und B is the minimum distance between the elements of each cluster: d(A,B) = min { d (a, b) | a ∈ A, b ∈ B }
a b d(A,B)
![Page 32: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/32.jpg)
Single-Linkage Clustering
• Advantage: Can detect very long and even curved clusters.
Can be used to detect outliers.
• Drawback: Chaining phenomen
Clusters that are very distant to each other may be forced together due to single elements being close to each other.
A
B
C
![Page 33: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/33.jpg)
Complete-Linkage Clustering
Furthest-Neighbor-Method
The distance between the clusters A and B is the maximum distance between the elements of each cluster:
d(A,B) = max { d(a,b) | a ∈ A, b ∈ B }
a b d (A, B)
![Page 34: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/34.jpg)
Complete-Linkage Clustering
• … tends to find compact clusters of approximately equal diameters. • … avoids the chaining phenomen.
• … cannot be used for outlier detection.
![Page 35: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/35.jpg)
Average-Linkage Clustering
The distance between the clusters A and B is the mean distance between the elements of each cluster: d (A, B) = d (a, b)
|A| ⋅ |B| 1 Σ
a ∈ A, b ∈ B
⋅
d(A,B)
a b
A B
![Page 36: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/36.jpg)
Centroid Method
The distance between the clusters A and B is the (squared) Euclidean distance of the cluster centroids.
d (A, B) x x
![Page 37: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/37.jpg)
d (A, B) x x
Agglomerative Hierarchical Clustering
a b d (A, B) d (A, B)
d (A, B) d (A, B)
d (A, B)
![Page 38: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/38.jpg)
Bioinformatics
Alizadeh et al., Nature 403 (2000): pp.503–511
![Page 39: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/39.jpg)
Exercise
Paris
Berlin Kiev
Odessa
![Page 40: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/40.jpg)
Exercise
Kiev Odessa Berlin Paris
Kiev 440 1200 2000
Odessa 440 1400 2100
Berlin 1200 1400 900
Paris 2000 2100 900
The following table shows the distances between 4 cities:
Determine a hierarchical clustering with
the single linkage method.
![Page 41: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/41.jpg)
Solution - Single Linkage
Kiev Odessa Berlin Paris
Kiev 440 1200 2000
Odessa 440 1400 2100
Berlin 1200 1400 900
Paris 2000 2100 900
Step 0: Clustering
{Kiev}, {Odessa}, {Berlin}, {Paris}
Distances between clusters
![Page 42: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/42.jpg)
Solution - Single Linkage
Kiev Odessa Berlin Paris
Kiev 440 1200 2000
Odessa 440 1400 2100
Berlin 1200 1400 900
Paris 2000 2100 900
Step 0: Clustering
{Kiev}, {Odessa}, {Berlin}, {Paris}
Distances between clusters minimal distance
⇒ Merge clusters { Kiev } and { Odessa } Distance value: 440
![Page 43: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/43.jpg)
Solution - Single Linkage
Kiev, Odessa Berlin Paris
Kiev, Odessa 1200 2000
Berlin 1200 900
Paris 2000 900
Step 1: Clustering
{Kiev, Odessa}, {Berlin}, {Paris}
Distances between clusters
![Page 44: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/44.jpg)
Solution - Single Linkage
Kiev, Odessa Berlin Paris
Kiev, Odessa 1200 2000
Berlin 1200 900
Paris 2000 900
Step 1: Clustering
{Kiev, Odessa}, {Berlin}, {Paris}
Distances between clusters minimal distance
⇒ Merge clusters { Berlin } and { Paris } Distance value: 900
![Page 45: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/45.jpg)
Solution - Single Linkage
Kiev, Odessa Berlin, Paris
Kiev, Odessa 1200
Berlin, Paris 1200
Step 2: Clustering
{Kiev, Odessa}, {Berlin, Paris}
Distances between clusters minimal distance
⇒ Merge clusters { Kiev, Odessa } and { Berlin, Paris } Distance value: 1200
![Page 46: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/46.jpg)
Solution - Single Linkage
Step 3: Clustering
{Kiev, Odessa, Berlin, Paris}
![Page 47: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/47.jpg)
Solution - Single Linkage
Hierarchy
Kiev Odessa Berlin Paris 0
440
4 clusters
3 clusters
2 clusters
1 cluster
1340
2540
440
900
1200
Distance values
![Page 48: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/48.jpg)
Divisive Clustering
![Page 49: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/49.jpg)
Divisive Algorithms
• Begin with one cluster C1 = { {e1, e2, ... , en} } • Terminate when all objects are in disjoint clusters Cn = { {e1}, {e2}, ... , {en} } • Iterate Chose a cluster Cf , that optimally splits two particular clusters Ci and Cj according to a given criterion.
Sequence of clusterings (Ci )i=1,...n of G with
C i ⊃ C i + 1 for i = 1,...,n-1.
e1 e2 e3 e4
![Page 50: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/50.jpg)
Partitional Clustering
Minimal Distance Methods
![Page 51: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/51.jpg)
Partitional Clustering
• Aims to partition n observations into K clusters. • The number of clusters and
an initial partition are given.
• The initial partition is considered as
“not optimal“ and should be
iteratively repartitioned.
The number of clusters is given !!!
K = 2
K = 2
initial partition
final partition
![Page 52: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/52.jpg)
Partitional Clustering
Difference to hierarchical clustering
• number of clusters is fixed.
• an object can change the cluster.
Initial partition is obtained by
• random or
• the application of an hierarchical clustering algorithm in advance.
Estimation of the number of clusters
• specialized methods (e.g., Silhouette) or
• the application of an hierarchical clustering algorithm in advance.
![Page 53: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/53.jpg)
Partitional Clustering - Methods
• K-Means and
• Fuzzy-C-Means
In this course we will introduce the minimal distance methods . . .
![Page 54: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/54.jpg)
K-Means
![Page 55: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/55.jpg)
K-Means
Find K cluster centroids µ1 ,..., µK
that minimize the objective function
dist ( µi, x ) i = 1 Σ K
x ∈ C i Σ J =
2
Aims to partition n observations into K clusters
in which each observation belongs to the cluster with the nearest mean.
G
C1
C2
C3
![Page 56: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/56.jpg)
K-Means
Find K cluster centroids µ1 ,..., µK
that minimize the objective function
dist ( µi, x ) i = 1 Σ K
x ∈ C i Σ J =
2
Aims to partition n observations into K clusters
in which each observation belongs to the cluster with the nearest mean.
G
C1
C2
C3
x x
x
![Page 57: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/57.jpg)
K-Means - Minimal Distance Method
x x
Given: n objects, K clusters
1. Determine initial partition.
2. Calculate cluster centroids.
3. For each object, calculate the distances to all cluster centroids.
4. If the distance to the centroid of another cluster is smaller than the distance to the actual cluster centroid, then assign the object to the other cluster.
repartition
5. If clusters are repartitioned: GOTO 2.
ELSE: STOP.
![Page 58: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/58.jpg)
Example
Initial Partition
ₓ ₓ ₓ ₓ
Final Partition
![Page 59: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/59.jpg)
Exercise
Initial Partition
ₓ ₓ
Final Partition
ₓ
ₓ
![Page 60: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/60.jpg)
K-Means
• K-Means does not determine the global optimal partition.
• The final partition obtained by K-Means depends on the initial partition.
![Page 61: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/61.jpg)
Hard Clustering / Soft Clustering
Hard Clustering
Clustering
Soft Clustering
Each object is a member of exactly one cluster
Each object has a fractional membership in all clusters
K-Means Fuzzy-c-Means
![Page 62: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/62.jpg)
Fuzzy-c-Means
![Page 63: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/63.jpg)
• When clusters are well separated, hard clustering (K-Means) makes sense.
• In many cases, clusters are not well separated.
In hard clustering, borderline objects are assigned to a cluster in an arbitrary manner.
Fuzzy Clustering vs. Hard Clustering
![Page 64: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/64.jpg)
• Fuzzy Theory was introduced by Lofti Zadeh in 1965.
• An object can belong to a set with a degree of membership
between 0 and 1.
• Classical set theory is a special case of fuzzy theory
that restricts membership values to be either 0 or 1.
Fuzzy Set Theory
![Page 65: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/65.jpg)
• Is based on fuzzy logic and fuzzy set theory.
• Objects can belong to more than one cluster.
• Each object belongs to all clusters with some weight (degree of membership)
Fuzzy Clustering
1
0
Cluster 1
Cluster 2
Cluster 3
![Page 66: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/66.jpg)
Hard Clustering
• K-Means
Object
Cluster e1 e2 e3 e4
C1 0 1 0 0
C2 1 0 0 0
C3 0 0 1 1 C1
e2 C2
e1
C3
e3 e4
− The number K of clusters is given.
− Each object is assigned to exactly one cluster.
Partition
![Page 67: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/67.jpg)
Fuzzy Clustering
• Fuzzy-c-means
Object
Cluster e1
e2
e3
e4
C1 0.8 0.2 0.1 0.0
C2 0.2 0.2 0.2 0.0
C3 0.0 0.6 0.7 1.0
Σ 1 1 1 1
− The number c of clusters is given.
− Each object has a fractional membership in all clusters
Fuzzy-Clustering
There is no strict sub-division of clusters.
![Page 68: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/68.jpg)
Fuzzy-c-Means
• Membership Matrix
The entry u i k denotes the degree of membership of object k in cluster i .
U = ( u i k ) ∈ [0, 1]c x n
Object 1 Object 2 … Object n
Cluster 1 u11 u12 … u1n
Cluster 2 u21 u22 … u2n
Cluster c uc1 uc2 … ucn
…
…
…
…
![Page 69: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/69.jpg)
Restrictions (Membership Matrix)
1. All weights for a given object, ek, must add up to 1.
2. Each cluster contains – with non-zero weight – at least one object,
but does not contain – with a weight of one – all the objects.
i = 1 Σ c
u i k = 1 (k = 1,...,n)
k = 1 Σ n
u i k < n (i = 1,...,c) 0 <
![Page 70: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/70.jpg)
Fuzzy-c-Means
• Vector of prototypes (cluster centroids) Remark
The cluster centroids and the membership matrix are initialized randomly.
Afterwards they are iteratively optimized.
V = ( v1,...,vc ) ∈ Rc T
![Page 71: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/71.jpg)
Fuzzy-c-Means
ALGORITHM
1. Select an initial fuzzy partition U = (u i k )
⇒ assign values to all u i k 2. Repeat
3. Compute the centroid of each cluster using the fuzzy partition
4. Update the fuzzy partition U = (u i k )
5. Until the centroids do not change.
Other stopping criterion: “change in the u i k is below a given threshold”.
![Page 72: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/72.jpg)
Fuzzy-c-Means
• K-Means and Fuzzy-c-Means attempt to minimize the sum of the squared errors (SSE). • In K-Means:
• In Fuzzy-c-Means:
dist ( vi, xk ) u i k m
i = 1 Σ c
k = 1 Σ n . SSE =
2
dist ( vi, x ) i = 1 Σ K
x ∈ C i Σ SSE =
2
m ∈ [1, ∞] is a parameter (fuzzifier) that determines the influence of the weights.
u1k v1
v2
v3
xk u3k
u2k
![Page 73: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/73.jpg)
Computing Cluster Centroids
• For each cluster i = 1,...,c the centroid is defined by • This is an extension of the definition of centroids of k-means.
• All points are considered and the contribution of each point
to the centroid is weighted by its membership degrees.
u1k v1
v2
v3
xk u3k
u2k
k = 1 Σ n v i =
u i k m
_________________ xk u i k
m
k = 1 Σ n
( i = 1,...,c )
(V)
![Page 74: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/74.jpg)
• Minimization of SSE subject to the constraints leads to the following update formula:
s = 1 Σ c
u i k =
dist ( v i , xk ) 2
dist ( vs , xk ) __________
1 m – 1 _____
______________________________________ 1
Update of the Fuzzy Partition (Membership Matrix)
2 (U)
![Page 75: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/75.jpg)
Fuzzy-c-Means
Iteration
Calculate updates of
• Matrix U of membership grades with (U)
• Matrix V of cluster centroids with (V)
until cluster centroids are stable or the maximum number of iterations is reached.
Initialization
Determine (randomly)
• Matrix U of membership grades
• Matrix V of cluster centroids.
![Page 76: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/76.jpg)
Fuzzy-c-means
• Fuzzy-c-means depends on the Euclidian metric
⇒ spherical clusters. • Other metrics can be applied to obtain different cluster shapes.
• Fuzzy covariance matrix (Gustafson/Kessel 1979)
⇒ ellipsoidal clusters.
![Page 77: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/77.jpg)
Cluster Validity Indizes
![Page 78: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/78.jpg)
Cluster Validity Indexes
Fuzzy-c-means requires the number of clusters as input.
Question: How can we determine the “optimal” number of clusters?
Method: For all possible number of clusters calculate the cluster validity index. Then, determine the optimal number of clusters.
Note: CVIs usually do not depend on the clustering algorithm.
Idea: Determine the cluster partition for a given number of clusters. Then, evaluate the cluster partition by a cluster validity index.
![Page 79: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/79.jpg)
Cluster Validity Indexes
• Partition Coefficient (Bezdek 1981)
• Optimal number of clusters c∗ :
PC (c) = i = 1 Σ c
k = 1 Σ n
u i k 2
PC (c∗) = max 2 ≤ c ≤ n-1
PC (c)
1 n
__ , 2 ≤ c ≤ n-1
![Page 80: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/80.jpg)
Cluster Validity Indexes
• Partition Entropy (Bezdek 1974)
• Optimal number of clusters c∗ :
• Drawback of PC and PE: Only degrees of memberships are considered. The geometry of the data set is neglected.
PC (c∗) = min 2 ≤ c ≤ n-1
PC (c)
PC (c) = i = 1 Σ c
k = 1 Σ n
u i k , 2 ≤ c ≤ n-1 1 n
__ _ log2 u i k
![Page 81: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/81.jpg)
Cluster Validity Indexes
• Fukuyama-Sugeno Index (Fukuyama/Sugeno 1989)
• Optimal number of clusters c∗ :
FS (c) = Compactness of clusters
Separation of clusters
_
i =1
c 1 c __ v = vi PC (c∗) = max
2 ≤ c ≤ n-1 PC (c) Σ
i = 1 Σ c
k = 1 Σ n
u i k m
dist ( vi , xk )
i = 1 Σ c
k = 1 Σ n
u i k m
dist ( vi , v ) _
2
2 _
![Page 82: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/82.jpg)
Application
![Page 83: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/83.jpg)
Data Mining and Decision Support Systems Landslide Events (UniBw, Geoinformatics Group: W. Reinhardt, E. Nuhn)
• Measurements (pressure values, tension, deformation vectors)
• Simulations (finite-element model)
→ Spatial Data Mining / Early Warning Systems for Landslide Events
→ Fuzzy clustering approaches (feature weighting)
![Page 84: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/84.jpg)
Problem: Uncertain data from measurements and simulations
Partition
Hard Clustering
Data
![Page 85: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/85.jpg)
Fuzzy Clustering
Fuzzy-Cluster Fuzzy-Partition
Data
![Page 86: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/86.jpg)
Fuzzy Clustering
![Page 87: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/87.jpg)
Feature Weighting
Nuhn/Kropat/Reinhardt/Pickl: Preparation of complex landslide simulation results with clustering approaches for decision support and early warning. Submitted to Hawaii International Conference on System Sciences (HICCS 45), Grand Wailea, Maui, 2012.
![Page 88: Cluster Analysis](https://reader034.vdocument.in/reader034/viewer/2022051412/54c6bf3d4a79593d7c8b4577/html5/thumbnails/88.jpg)
Thank you very much!