clustering - based on loïc cerf's slides (ufmg)

104
Clustering based on Lo¨ ıc Cerf’s slides (UFMG) Marc Plantevit March 11, 2021 UCBL – LIRIS – DM2L

Upload: others

Post on 01-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Clustering - based on Loïc Cerf's slides (UFMG)

Clusteringbased on Loıc Cerf’s slides (UFMG)

Marc Plantevit

March 11, 2021UCBL – LIRIS – DM2L

Page 2: Clustering - based on Loïc Cerf's slides (UFMG)

Clustering Approaches

Partition-based algorithms: build several partitions then assess themw.r.t. some criteria.

Hierarchy-based algorithms: create a hierarchical decomposition ofthe objects w.r.t. some criteria.

Density-based algorithms: based on the notions of density andconnectivity.

2 / 52Marc Plantevit Clustering

N

Page 3: Clustering - based on Loïc Cerf's slides (UFMG)

Characteristics

extensibility

ability to handle different data types

prior for parameter settings

ability to handle noisy data and outliers

3 / 52Marc Plantevit Clustering

N

Page 4: Clustering - based on Loïc Cerf's slides (UFMG)

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

4 / 52Marc Plantevit Clustering

N

Page 5: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

5 / 52Marc Plantevit Clustering

N

Page 6: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Inductive database vision

Querying a clustering:

X ∈ P | Q(X ,D)where:

D is a set of objects O associated with a similarity measure,

P is (C1, . . . ,Ck) ∈ (2O)k |

∀i = 1..k ,Ci 6= ∅∀j 6= i ,Ci ∩ Cj 6= ∅∪kl=1Cl = O

,

Q is a function to optimize. It quantifies how similar are pairs ofobjects in a same cluster and how dissimilar are those in twodifferent clusters

6 / 52Marc Plantevit Clustering

N

Page 7: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Inductive database vision

Querying a clustering with k-means:

X ∈ P | Q(X ,D)where:

D is a set of objects O associated with a similarity measure,

P is (C1, . . . ,Ck) ∈ (2O)k |

∀i = 1..k ,Ci 6= ∅∀j 6= i ,Ci ∩ Cj 6= ∅∪kl=1Cl = O

where

k ∈ N \ 0 is fixed,

Q is the maximization of the sum, over all objects, of thesimilarities to the centers of the assigned clusters:

(C1, . . . ,Ck) 7→k∑

i=1

∑o∈Ci

s(o,

∑o∈Ci

|Ci |)

6 / 52Marc Plantevit Clustering

N

Page 8: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Inductive database vision

Querying a clustering with k-means:

X ∈ P | Q(X ,D)where:

D is a set of objects O associated with a similarity measure,

P is (C1, . . . ,Ck) ∈ (2O)k |

∀i = 1..k ,Ci 6= ∅∀j 6= i ,Ci ∩ Cj 6= ∅∪kl=1Cl = O

where

k ∈ N \ 0 is fixed,

Q is the maximization of the sum, over all objects, of thesimilarities to the centers of the assigned clusters:

(C1, . . . ,Ck) 7→k∑

i=1

∑o∈Ci

s(o, µi )

6 / 52Marc Plantevit Clustering

N

Page 9: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Exact algorithm

Input: O,D, k ∈ N \ 0Output: the clustering of O maximizing f : the sum, over allobjects, of the similarities to the centers of the assigned clustersCmax ← ∅fmax ← −∞for all k-clustering C of O do

if f (C,D) > fmax thenfmax ← f (C,D)Cmax ← C

end ifend foroutput(Cmax)

7 / 52Marc Plantevit Clustering

N

Page 10: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Number of k-clusterings

Question

How many k-clusterings are enumerated?

The Stirling number

of the second kind, i. e., 1k!

∑kt=0(−1)t

(kt

)(k−t)n = O(kn).

8 / 52Marc Plantevit Clustering

N

Page 11: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Number of k-clusterings

Question

How many k-clusterings are enumerated? The Stirling number

of the second kind, i. e., 1k!

∑kt=0(−1)t

(kt

)(k−t)n = O(kn).

8 / 52Marc Plantevit Clustering

N

Page 12: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

k-means principles

k-means is a greedy iterative approach that always converges to alocal maximum of the sum, over all objects, of the similarities tothe centers of the assigned clusters.

An iteration consists in two steps:

E Each object is assigned to the cluster whose center isthe most similar (thus defining a clustering);

M The center of each cluster is updated to the mean ofthe objects assigned to it.

Initially, the centers of the clusters are randomly drawn. Theprocedure stops when, from an iteration to the next one, thecenters of the clusters have not changed much (or at all).

9 / 52Marc Plantevit Clustering

N

Page 13: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

k-means principles

k-means is a greedy iterative approach that always converges to alocal maximum of the sum, over all objects, of the similarities tothe centers of the assigned clusters.

An iteration consists in two steps:

E Each object is assigned to the cluster whose center isthe most similar (thus defining a clustering);

M The center of each cluster is updated to the mean ofthe objects assigned to it.

Initially, the centers of the clusters are randomly drawn. Theprocedure stops when, from an iteration to the next one, thecenters of the clusters have not changed much (or at all).

9 / 52Marc Plantevit Clustering

N

Page 14: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

k-means principles

k-means is a greedy iterative approach that always converges to alocal maximum of the sum, over all objects, of the similarities tothe centers of the assigned clusters.

An iteration consists in two steps:

E Each object is assigned to the cluster whose center isthe most similar (thus defining a clustering);

M The center of each cluster is updated to the mean ofthe objects assigned to it.

Initially, the centers of the clusters are randomly drawn. Theprocedure stops when, from an iteration to the next one, thecenters of the clusters have not changed much (or at all).

9 / 52Marc Plantevit Clustering

N

Page 15: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

k-means principles

k-means is a greedy iterative approach that always converges to alocal maximum of the sum, over all objects, of the similarities tothe centers of the assigned clusters.

An iteration consists in two steps:

E Each object is assigned to the cluster whose center isthe most similar (thus defining a clustering);

M The center of each cluster is updated to the mean ofthe objects assigned to it.

Initially, the centers of the clusters are randomly drawn. Theprocedure stops when, from an iteration to the next one, thecenters of the clusters have not changed much (or at all).

9 / 52Marc Plantevit Clustering

N

Page 16: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Dataset:

10 / 52Marc Plantevit Clustering

N

Page 17: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Iteration 1:

10 / 52Marc Plantevit Clustering

N

Page 18: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Iteration 2:

10 / 52Marc Plantevit Clustering

N

Page 19: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Iteration 3:

10 / 52Marc Plantevit Clustering

N

Page 20: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Iteration 4:

10 / 52Marc Plantevit Clustering

N

Page 21: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

2-means with |A| = 1: illustration

2-means clustering of the objects in a one-dimensional space usingthe Euclidean distance.

Iteration 5:

10 / 52Marc Plantevit Clustering

N

Page 22: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

3-means with |A| = 2: illustration

3-means clustering of the objects in a two-dimensional space usingthe Euclidean distance.

x y

o1 91 70o2 129 91o3 359 243o4 322 254o5 100 104o6 464 113o7 342 297o8 410 65o9 334 329...

......

11 / 52Marc Plantevit Clustering

N

Page 23: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

k-means algorithm

Input: O,D, k ∈ N \ 0Output: a clustering of O locally maximizing the sum, over allobjects, of the similarities to the centers of the assigned clusters(µi )i=1..k ← random(D)repeat

(Ci )i=1..k ← assign cluster(O,D, (µi )i=1..k)(c , (µi )i=1..k)← update centers(D, (Ci )i=1..k , (µi )i=1..k))

until coutput((Ci )i=1..k)

12 / 52Marc Plantevit Clustering

N

Page 24: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

assign cluster

Input: O,D, (µi )i=1..k ∈ (R|A|)kOutput: (Ci )i=1..k the clustering of O such that∀i = 1..k ,∀j 6= i ,∀o ∈ Ci , s(o, µi ) ≥ s(o, µj)for all o ∈ O do

a← arg maxi=1..k s(o, µi )Ca ← Ca ∪ o

end forreturn((Ci )i=1..k)

13 / 52Marc Plantevit Clustering

N

Page 25: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of assign cluster

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?

O(k|O × A|).

14 / 52Marc Plantevit Clustering

N

Page 26: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of assign cluster

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?O(k|O × A|).

14 / 52Marc Plantevit Clustering

N

Page 27: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

update centers

Input: D, (Ci )i=1..k a clustering of O, (µi )i=1..k ∈ (R|A|)kOutput: c ∈ false, true indicating whether the convergence is

reached, (µ′i )i=1..k) ∈ (R|A|)k such that ∀i = 1..k, µ′i =

∑o∈Ci

o

|Ci |c ← truefor i = 1→ k do

µ′i ←∑

o∈Cio

|Ci |if µ′i 6= µi thenc ← false

end ifend forreturn(c , (µ′i )i=1..k)

15 / 52Marc Plantevit Clustering

N

Page 28: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of k-means

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?O(k|O × A|).

Question

What is the complexity of update centers?

O(|O × A|).

Question

What is the complexity of k-means if t ∈ N iterations are nec-essary to converge?

O(tk|O × A|).

16 / 52Marc Plantevit Clustering

N

Page 29: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of k-means

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?O(k|O × A|).

Question

What is the complexity of update centers? O(|O × A|).

Question

What is the complexity of k-means if t ∈ N iterations are nec-essary to converge?

O(tk|O × A|).

16 / 52Marc Plantevit Clustering

N

Page 30: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of k-means

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?O(k|O × A|).

Question

What is the complexity of update centers? O(|O × A|).

Question

What is the complexity of k-means if t ∈ N iterations are nec-essary to converge?

O(tk|O × A|).

16 / 52Marc Plantevit Clustering

N

Page 31: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Complexity of k-means

Question

Assuming the computation of a similarity is linear in the num-ber of attributes |A|, what is the complexity of assign cluster?O(k|O × A|).

Question

What is the complexity of update centers? O(|O × A|).

Question

What is the complexity of k-means if t ∈ N iterations are nec-essary to converge? O(tk |O × A|).

16 / 52Marc Plantevit Clustering

N

Page 32: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Convergence

Worst-case scenarios require 2Ω(|O|) iterations to converge but asmoothed analysis gives a polynomial complexity.

The low complexity of k-means is its greatest advantage.

17 / 52Marc Plantevit Clustering

N

Page 33: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Convergence

Worst-case scenarios require 2Ω(|O|) iterations to converge but asmoothed analysis gives a polynomial complexity.

The low complexity of k-means is its greatest advantage.

17 / 52Marc Plantevit Clustering

N

Page 34: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

18 / 52Marc Plantevit Clustering

N

Page 35: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

18 / 52Marc Plantevit Clustering

N

Page 36: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

18 / 52Marc Plantevit Clustering

N

Page 37: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

18 / 52Marc Plantevit Clustering

N

Page 38: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

18 / 52Marc Plantevit Clustering

N

Page 39: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

The elbow method

Plot a measure of the quality of the k clusters (e. g., the sum, overall objects, of the similarities to the centers of the assignedclusters) when k increases. Choose k after a large drop of thegrowth.

More principled method exist and can be seen as variants (findingthe best trade-off between quality and compression).

If the quadratic time complexity of a hierarchical agglomeration isnot prohibitive, the number of clusters can be determined from thedendrogram.

19 / 52Marc Plantevit Clustering

N

Page 40: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

The elbow method

Plot a measure of the quality of the k clusters (e. g., the sum, overall objects, of the similarities to the centers of the assignedclusters) when k increases. Choose k after a large drop of thegrowth.

More principled method exist and can be seen as variants (findingthe best trade-off between quality and compression).

If the quadratic time complexity of a hierarchical agglomeration isnot prohibitive, the number of clusters can be determined from thedendrogram.

19 / 52Marc Plantevit Clustering

N

Page 41: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

The elbow method

Plot a measure of the quality of the k clusters (e. g., the sum, overall objects, of the similarities to the centers of the assignedclusters) when k increases. Choose k after a large drop of thegrowth.

More principled method exist and can be seen as variants (findingthe best trade-off between quality and compression).

If the quadratic time complexity of a hierarchical agglomeration isnot prohibitive, the number of clusters can be determined from thedendrogram.

19 / 52Marc Plantevit Clustering

N

Page 42: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Limitations of k-means

Convergence towards a local maximum of the sum, over all objects,of the similarities to the centers of the assigned clusters;

Sensitivity to outliers (k-medoids replaces the means by medians);

Tendency to produce equi-sized clusters;

The number of clusters must be known beforehand.

20 / 52Marc Plantevit Clustering

N

Page 43: Clustering - based on Loïc Cerf's slides (UFMG)

k-means

Tendency to produce equi-sized clusters

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Dataset

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

k-means clustering

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

EM clustering

21 / 52Marc Plantevit Clustering

N

Page 44: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

22 / 52Marc Plantevit Clustering

N

Page 45: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM assumptions

The dataset D is seen as a random sample from a |A|-dimensionalrandom variable O.

This probability density function is given as a mixture model of thek ∈ N \ 0 clusters (Ci )i=1..k :

f (o) =k∑

i=1

fi (o)P(Ci )

, where P(Ci ) is the probability to belong to the cluster Ci and fi isthe probability density function of this cluster whose type ofdistribution is chosen beforehand.

23 / 52Marc Plantevit Clustering

N

Page 46: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM assumptions

The dataset D is seen as a random sample from a |A|-dimensionalrandom variable O.

This probability density function is given as a mixture model of thek ∈ N \ 0 clusters (Ci )i=1..k :

f (o) =k∑

i=1

fi (o)P(Ci )

, where P(Ci ) is the probability to belong to the cluster Ci and fi isthe probability density function of this cluster whose type ofdistribution is chosen beforehand.

23 / 52Marc Plantevit Clustering

N

Page 47: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximum likelihood estimation

EM searches a parametrization θ of f (i. e., (P(Ci )i=1..k and theparametrization of the (fi )i=1..k) so that the likelihood that D isindeed a random sample of O is maximized:

arg maxθ

P(D|θ) .

Since the dataset is assumed to be a random sample from O (i. e.,independent and identically distributed as O), the objectivebecomes the computation of:

arg maxθ

∏o∈O

f (o) .

It usually is hard to analytically compute arg maxθ∏

o∈O f (o).

24 / 52Marc Plantevit Clustering

N

Page 48: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximum likelihood estimation

EM searches a parametrization θ of f (i. e., (P(Ci )i=1..k and theparametrization of the (fi )i=1..k) so that the likelihood that D isindeed a random sample of O is maximized:

arg maxθ

P(D|θ) .

Since the dataset is assumed to be a random sample from O (i. e.,independent and identically distributed as O), the objectivebecomes the computation of:

arg maxθ

∏o∈O

f (o) .

It usually is hard to analytically compute arg maxθ∏

o∈O f (o).

24 / 52Marc Plantevit Clustering

N

Page 49: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximum likelihood estimation

EM searches a parametrization θ of f (i. e., (P(Ci )i=1..k and theparametrization of the (fi )i=1..k) so that the likelihood that D isindeed a random sample of O is maximized:

arg maxθ

P(D|θ) .

Since the dataset is assumed to be a random sample from O (i. e.,independent and identically distributed as O), the objectivebecomes the computation of:

arg maxθ

∏o∈O

f (o) .

It usually is hard to analytically compute arg maxθ∏

o∈O f (o).

24 / 52Marc Plantevit Clustering

N

Page 50: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM principles

EM is a greedy iterative approach that always converges to a localmaximum of P(D|θ).

An iteration consists in two steps:

E Given θ, the posterior probabilities of each object tobelong to each cluster is computed;

M θ is updated to reflect these probabilities.

Initially, the parametrization of θ is randomly drawn and∀i = 1..k,P(Ci ) = 1

k . The procedure stops when, from an iterationto the next one, the parametrization has not changed much (or atall).

25 / 52Marc Plantevit Clustering

N

Page 51: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM principles

EM is a greedy iterative approach that always converges to a localmaximum of P(D|θ).

An iteration consists in two steps:

E Given θ, the posterior probabilities of each object tobelong to each cluster is computed;

M θ is updated to reflect these probabilities.

Initially, the parametrization of θ is randomly drawn and∀i = 1..k,P(Ci ) = 1

k . The procedure stops when, from an iterationto the next one, the parametrization has not changed much (or atall).

25 / 52Marc Plantevit Clustering

N

Page 52: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM principles

EM is a greedy iterative approach that always converges to a localmaximum of P(D|θ).

An iteration consists in two steps:

E Given θ, the posterior probabilities of each object tobelong to each cluster is computed;

M θ is updated to reflect these probabilities.

Initially, the parametrization of θ is randomly drawn and∀i = 1..k,P(Ci ) = 1

k . The procedure stops when, from an iterationto the next one, the parametrization has not changed much (or atall).

25 / 52Marc Plantevit Clustering

N

Page 53: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM principles

EM is a greedy iterative approach that always converges to a localmaximum of P(D|θ).

An iteration consists in two steps:

E Given θ, the posterior probabilities of each object tobelong to each cluster is computed;

M θ is updated to reflect these probabilities.

Initially, the parametrization of θ is randomly drawn and∀i = 1..k,P(Ci ) = 1

k . The procedure stops when, from an iterationto the next one, the parametrization has not changed much (or atall).

25 / 52Marc Plantevit Clustering

N

Page 54: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Expectation step

Given θ, the posterior probability of an object o ∈ O to belong toa cluster Ci is:

P(Ci |o) =P(Ci ∧ o)

P(o)

=P(o|Ci )P(Ci )∑a=1..k P(o ∧ Ca)

=P(o|Ci )P(Ci )∑

a=1..k P(o|Ca)P(Ca)

=fi (o)P(Ci )∑

a=1..k fa(o)P(Ca).

26 / 52Marc Plantevit Clustering

N

Page 55: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximization step (1/2)

The distribution of a cluster usually is assumed multivariatenormal, thus parametrized with a location (the center of thecluster) and a covariance matrix.

Given (P(Ci |o))i=1..k,o∈O, the location of the cluster Ci is updatedto the weighted sample mean µi :∑

o∈O P(Ci |o)o∑o∈O P(Ci |o)

.

27 / 52Marc Plantevit Clustering

N

Page 56: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximization step (1/2)

The distribution of a cluster usually is assumed multivariatenormal, thus parametrized with a location (the center of thecluster) and a covariance matrix.

Given (P(Ci |o))i=1..k,o∈O, the location of the cluster Ci is updatedto the weighted sample mean µi :∑

o∈O P(Ci |o)o∑o∈O P(Ci |o)

.

27 / 52Marc Plantevit Clustering

N

Page 57: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximization step (2/2)

Given (P(Ci |o))i=1..k,o∈O, the covariance of the cluster Ci betweenthe random variables Oa and Ob is updated to the weightedsample covariance:∑

o∈O P(Ci |o)(oa − µi ,a)(ob − µi ,b)∑o∈O P(Ci |o)

.

Given (P(Ci |o))i=1..k,o∈O, the prior probability of belonging to thecluster Ci is updated to: ∑

o∈O P(Ci |o)

|O|.

28 / 52Marc Plantevit Clustering

N

Page 58: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Maximization step (2/2)

Given (P(Ci |o))i=1..k,o∈O, the covariance of the cluster Ci betweenthe random variables Oa and Ob is updated to the weightedsample covariance:∑

o∈O P(Ci |o)(oa − µi ,a)(ob − µi ,b)∑o∈O P(Ci |o)

.

Given (P(Ci |o))i=1..k,o∈O, the prior probability of belonging to thecluster Ci is updated to: ∑

o∈O P(Ci |o)

|O|.

28 / 52Marc Plantevit Clustering

N

Page 59: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 1 and k = 2: illustration

EM clustering of the objects in a one-dimensional space using theEuclidean distance.

Dataset:

29 / 52Marc Plantevit Clustering

N

Page 60: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 1 and k = 2: illustration

EM clustering of the objects in a one-dimensional space using theEuclidean distance.

Iteration 1:

29 / 52Marc Plantevit Clustering

N

Page 61: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 1 and k = 2: illustration

EM clustering of the objects in a one-dimensional space using theEuclidean distance.

Iteration 5:

29 / 52Marc Plantevit Clustering

N

Page 62: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 2 and k = 3

EM clustering of the objects in a two-dimensional space using theEuclidean distance.

Dataset:

30 / 52Marc Plantevit Clustering

N

Page 63: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 2 and k = 3

EM clustering of the objects in a two-dimensional space using theEuclidean distance.

Iteration 1:

30 / 52Marc Plantevit Clustering

N

Page 64: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM with |A| = 2 and k = 3

EM clustering of the objects in a two-dimensional space using theEuclidean distance.

Iteration 36:

30 / 52Marc Plantevit Clustering

N

Page 65: Clustering - based on Loïc Cerf's slides (UFMG)

EM

EM algorithm with mixture of Gaussians

Input: O,D, k ∈ N \ 0Output: a fuzzy clustering of O corresponding to posteriorprobabilities of a locally maximized likelihood of a mixture ofGaussians(µi )i=1..k ← random(D)(Σi )i=1..k ← (I , . . . , I )(P(Ci ))i=1..k ← ( 1

k , . . . ,1k )

repeat(P(Ci |o))i=1..k,o∈O ←expectation(O,D, (µi )i=1..k , (Σi )i=1..k , (P(Ci ))i=1..k)(c , (µi )i=1..k , (Σi )i=1..k , (P(Ci ))i=1..k)←maximization(D, (P(Ci |o))i=1..k,o∈O, (µi )i=1..k)

until coutput((P(Ci |o))i=1..k,o∈O)

31 / 52Marc Plantevit Clustering

N

Page 66: Clustering - based on Loïc Cerf's slides (UFMG)

EM

expectation

Input: O,D, (µi )i=1..k ∈ (R|A|)k , (Σi )i=1..k ∈(R|A|×|A|+ )k , (P(Ci ))i=1..k ∈ [0, 1]k

Output: (P(Ci |o))i=1..k,o∈O the fuzzy assignment of theobjects in O to the clusters given by the mixture of Gaussiansparametrized with (µi )i=1..k , (Σi )i=1..k , (P(Ci ))i=1..k

for all o ∈ O dofor i = 1→ k doP(Ci |o)← fi (o)P(Ci )∑k

a=1 fa(o)P(Ca)

end forend forreturn((P(Ci |o))i=1..k,o∈O)

32 / 52Marc Plantevit Clustering

N

Page 67: Clustering - based on Loïc Cerf's slides (UFMG)

EM

expectation

Input: O,D, (µi )i=1..k ∈ (R|A|)k , (Σi )i=1..k ∈(R|A|×|A|+ )k , (P(Ci ))i=1..k ∈ [0, 1]k

Output: (P(Ci |o))i=1..k,o∈O the fuzzy assignment of theobjects in O to the clusters given by the mixture of Gaussiansparametrized with (µi )i=1..k , (Σi )i=1..k , (P(Ci ))i=1..k

for all o ∈ O dofor i = 1→ k do

P(Ci |o)← (det(Σi )e(o−µi )

T Σ−1i

(o−µi ))−12 P(Ci )∑k

a=1(det(Σa)e(o−µa)T Σ−1a (o−µa))

−12 P(Ca)

end forend forreturn((P(Ci |o))i=1..k,o∈O)

32 / 52Marc Plantevit Clustering

N

Page 68: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k?

kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known?

O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known?

O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 69: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known?

O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known?

O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 70: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known?

O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known?

O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 71: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known? O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known?

O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 72: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known? O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known?

O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 73: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known? O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known? O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 74: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known? O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known? O(|A|2).

Question

What is the complexity of expectation?

O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 75: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of expectation

Question

What is the complexity of computing (det(Σi ))i=1..k? kO(|A|3).

Question

What is the complexity of computing (Σ−1i )i=1..k once

(det(Σi ))i=1..k is known? O(k |A|2).

Question

What is the complexity of computing one Mahalanobis distance,(o, o ′) 7→ (o − o ′)TΣ−1

i (o − o ′), once Σ−1i is known? O(|A|2).

Question

What is the complexity of expectation? O(k|A|2(|O|+ |A|)).

33 / 52Marc Plantevit Clustering

N

Page 76: Clustering - based on Loïc Cerf's slides (UFMG)

EM

maximization

Input: D, (P(Ci |o))i=1..k,o∈O ∈ [0, 1]k|O|, (µi )i=1..k ∈ (R|A|)kOutput: c ∈ false, true indicating whether the convergence isreached, the new parametrization of the mixture of Gaussiansc ← truefor i = 1→ k doµ′i ←

∑o∈O P(Ci |i)o∑o∈O P(Ci |i)

if µ′i 6= µi thenc ← false

end ifΣ′i ←

∑o∈O P(Ci |o)(o−µ′i )(o−µ′i )

T∑o∈O P(Ci |o)

P(Ci )′ ←

∑o∈O P(Ci |o)

|O|end forreturn(c , (µ′i )i=1..k , (Σ′i )i=1..k , (P(Ci )

′)i=1..k)

34 / 52Marc Plantevit Clustering

N

Page 77: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of EM

Question

What is the complexity of expectation? O(k|A|2(|O|+ |A|)).

Question

What is the complexity of maximization?

O(k |O||A|2).

Question

What is the complexity of EM if t ∈ N iterations are necessaryto converge?

O(tk |A|2(|O|+ |A|).

35 / 52Marc Plantevit Clustering

N

Page 78: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of EM

Question

What is the complexity of expectation? O(k|A|2(|O|+ |A|)).

Question

What is the complexity of maximization? O(k |O||A|2).

Question

What is the complexity of EM if t ∈ N iterations are necessaryto converge?

O(tk |A|2(|O|+ |A|).

35 / 52Marc Plantevit Clustering

N

Page 79: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of EM

Question

What is the complexity of expectation? O(k|A|2(|O|+ |A|)).

Question

What is the complexity of maximization? O(k |O||A|2).

Question

What is the complexity of EM if t ∈ N iterations are necessaryto converge?

O(tk |A|2(|O|+ |A|).

35 / 52Marc Plantevit Clustering

N

Page 80: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Complexity of EM

Question

What is the complexity of expectation? O(k|A|2(|O|+ |A|)).

Question

What is the complexity of maximization? O(k |O||A|2).

Question

What is the complexity of EM if t ∈ N iterations are necessaryto converge? O(tk |A|2(|O|+ |A|).

35 / 52Marc Plantevit Clustering

N

Page 81: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Diagonal covariance matrix

A lower complexity is obtained by assuming all attributesindependent, i. e., all covariance matrices diagonal. The operationsinvolving such a matrix become linear in |A| and the total timecomplexity of EM becomes O(tk |O × A|).

However, if the attributes are not really independent, the obtainedfuzzy clustering become much worse:

Full covariance matrices: Diagonal covariance matrices:

36 / 52Marc Plantevit Clustering

N

Page 82: Clustering - based on Loïc Cerf's slides (UFMG)

EM

Diagonal covariance matrix

A lower complexity is obtained by assuming all attributesindependent, i. e., all covariance matrices diagonal. The operationsinvolving such a matrix become linear in |A| and the total timecomplexity of EM becomes O(tk |O × A|).

However, if the attributes are not really independent, the obtainedfuzzy clustering become much worse:

Full covariance matrices: Diagonal covariance matrices:

36 / 52Marc Plantevit Clustering

N

Page 83: Clustering - based on Loïc Cerf's slides (UFMG)

EM

k-means as specialization of EM

k-means is EM with fi chosen as follows:1 if Ci = arg maxa=1..k s(o, µa)

0 otherwise.

37 / 52Marc Plantevit Clustering

N

Page 84: Clustering - based on Loïc Cerf's slides (UFMG)

Hierarchical Clustering

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

38 / 52Marc Plantevit Clustering

N

Page 85: Clustering - based on Loïc Cerf's slides (UFMG)

Hierarchical Clustering

Hierarchical Clustering

Build a hierarchy of clusters (not an unique partition);

The number of clusters k is not required as input;

Use a distance matrix as clustering criteria

An early-termination condition can be used (ex. nb clusters).

39 / 52Marc Plantevit Clustering

N

Page 86: Clustering - based on Loïc Cerf's slides (UFMG)

Hierarchical Clustering

Algorithm

Input: a sample of m objets x1, . . . , xm.

1 The algorithm begins with m clusters (1 cluster = 1 object);

2 Merge the 2 clusters that are the closest.

3 End If it remains only one cluster.

4 Go to step 2.

40 / 52Marc Plantevit Clustering

N

Page 87: Clustering - based on Loïc Cerf's slides (UFMG)

Hierarchical Clustering

Output: a dendogram

A hierarchy that can be split at a given level to form a partition.

the hierarchy: a tree called dendogram

the leaves = the objects

41 / 52Marc Plantevit Clustering

N

Page 88: Clustering - based on Loïc Cerf's slides (UFMG)

Hierarchical Clustering

Distance between clusters

Distance between the centers (centroid method)

Minimal distance among the pairs composed of objects from thetwo clusters (Single Link Method):

d(i , j) = minx∈Ci ;y∈Cjd(x , y)

Maximal distance among the pairs composed of objects from thetwo clusters (Complete Link Method):

d(i , j) = maxx∈Ci ;y∈Cjd(x , y)

Average distance among the pairs composed of objects from thetwo clusters (Average Linkage Method):

d(i , j) = avgx∈Ci ;y∈Cjd(x , y)

42 / 52Marc Plantevit Clustering

N

Page 89: Clustering - based on Loïc Cerf's slides (UFMG)

Pros:

Conceptually simple.

Theoretical propertieswell-known.

Cons:

The clustering is definitive:erroneous decisions areimpossible to modify later.

Non-extensible method forlarge collections of objects(θ(n2))

43 / 52Marc Plantevit Clustering

N

Page 90: Clustering - based on Loïc Cerf's slides (UFMG)

Density-based Clustering: DBSCAN

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

44 / 52Marc Plantevit Clustering

N

Page 91: Clustering - based on Loïc Cerf's slides (UFMG)

For this kind of problem, the use of similarity (or distance) measuresis less efficient than the use of neighborhood density

45 / 52Marc Plantevit Clustering

N

Page 92: Clustering - based on Loïc Cerf's slides (UFMG)

Density-based Clustering: DBSCAN

Density-based clustering

Clusters are seen as dense regions separated by regions that aremuch less denser (noise)

Two parameters:

Eps: The maximum radius of the neighborhoodMinPts: Minimum number of points within the Eps-neighborhoodof a point.

Neighborhood: VEps(p) = q ∈ D | dist(p, q) ≤ Eps

A point p is directly density-accessible from q w.r.t. Eps, MinPts if

P ∈ VEps(q) and |VEps(q)| ≥ MinPts

46 / 52Marc Plantevit Clustering

N

Page 93: Clustering - based on Loïc Cerf's slides (UFMG)

Accessibility: p is accessiblefrom q w.r.t. Eps, MinPts ifthere exists p1, . . . , pn suchthat p1 = q, pn = p and pi+1 isdirectly accessible from pi .

Connexity: p is connected to qw.r.t. Eps and MinPts if thereexists a point o such that pand q are accessible from o.

47 / 52Marc Plantevit Clustering

N

Page 94: Clustering - based on Loïc Cerf's slides (UFMG)

Density-based Clustering: DBSCAN

DBSCAN: Density Based Spatial Clus-tering of Applications with Noise

A cluster is the maximal set of connected points

Cluster shapes are not necessary convex

48 / 52Marc Plantevit Clustering

N

Page 95: Clustering - based on Loïc Cerf's slides (UFMG)

Density-based Clustering: DBSCAN

DBSCAN Algorithm

Choose p

Retrieve all poinst that accessible from p (w.r.t. Eps and MinPts)

If p is a center, then a cluster is created.

If p is a limit, then there is not accessible point from p, Skip toanother point

Repeat until it remains no point.

49 / 52Marc Plantevit Clustering

N

Page 96: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Outline

1 k-means

2 EM

3 Hierarchical Clustering

4 Density-based Clustering: DBSCAN

5 Conclusion

50 / 52Marc Plantevit Clustering

N

Page 97: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 98: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 99: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 100: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 101: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 102: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 103: Clustering - based on Loïc Cerf's slides (UFMG)

Conclusion

Summary

k-means iteratively assigns each object to the cluster whose centeris the most similar and recompute these centers as the mean of theobjects they were assigned;

Its strongest advantage is its low complexity;

Its worst drawback is its tendency to discover equi-sized clusters;

k-means actually is a specialization of a whole class of algorithmscalled EM;

They treat the dataset as a random sample of a multivariaterandom variable whose pdf is given as a mixture model;

They locally maximize the likelihood, i. e., the probability ofobserving the dataset given the parametrization of the mixturemodel;

They iteratively compute the expectation of the likelihood andupdate the parametrization so that this expectation is maximized.

51 / 52Marc Plantevit Clustering

N

Page 104: Clustering - based on Loïc Cerf's slides (UFMG)

The end.

52 / 52Marc Plantevit Clustering

N