introduction to computer science - imada.sdu.dkrolf/edu/dm534/e18/dm534_clustering.pdf · fall 2018...
TRANSCRIPT
![Page 1: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/1.jpg)
Introduction to Computer ScienceFall 2018
Fall 2018
DM534
Introduction toComputer Science
Clustering and Feature Spaces
![Page 2: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/2.jpg)
Introduction to Computer ScienceFall 2018
About Me
2
▪ Richard Roettger:▪ Computer Science (Technical University of Munich and thesis at the ICSI at
the University of California at Berkeley)
▪ PhD at the Max Planck Institute for Computer Science in Saarbrücken
▪ Since 2014: Assistant Professor at SDU
▪ Research Interests:▪ Bioinformatics
▪ Machine Learning
▪ Clustering
▪ Biological Networks
▪ Part of the Slides are taken from Arthur Zimek
![Page 3: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/3.jpg)
Introduction to Computer ScienceFall 2018
Learning Objectives
3
▪ Understand the problem of clustering in general
▪ Learn about k-means
▪ Understand the importance of feature spaces and object representation
▪ Understand the influence of distance functions
Clustering & Feature Spaces
![Page 4: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/4.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
4
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 5: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/5.jpg)
Introduction to Computer ScienceFall 2018
Purpose of Clustering
5
▪ Identify a finite number of categories (classes, groups: clusters) in a given dataset
▪ Similar objects shall be grouped in the same cluster, dissimilar objects in different clusters
▪ “similarity” is highly subjective, depending on the application scenario
![Page 6: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/6.jpg)
Introduction to Computer ScienceFall 2018
How Many Clusters?
6
➢ Image taken from: http://fromdatawithlove.thegovans.us/
![Page 7: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/7.jpg)
Introduction to Computer ScienceFall 2018
How Many Clusters?
7
➢ Everitt, Brian S., et al. “Cluster Analysis”, 5th Edition (2011).
![Page 8: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/8.jpg)
Introduction to Computer ScienceFall 2018
How Many Clusters?
8
➢ Everitt, Brian S., et al. “Cluster Analysis”, 5th Edition (2011).
![Page 9: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/9.jpg)
Introduction to Computer ScienceFall 2018
A Seemingly Simple Problem
9
➢ Figure from Tan et al. [2006].
▪ Each dataset can be clustered in many meaningful ways▪ Highly problem depended
▪ Not known by the algorithm a priori
![Page 10: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/10.jpg)
Introduction to Computer ScienceFall 2018
About Clustering
10
“Clustering is the unsupervised machine-learning task of “grouping or segmenting a collection of objects into subsets or ’clusters’ such that
those within each cluster are more closely related to one another than objects assigned to different clusters.“
▪ What is related? For example Customers?▪ Age?
▪ Behavior?
▪ Kinship?
▪ Treatment of Outliers?
▪ Ill-posed Problem ...▪ That means there exist multiple solutions
▪ What is the best?
![Page 11: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/11.jpg)
Introduction to Computer ScienceFall 2018
Overview of a Cluster Analysis
11
![Page 12: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/12.jpg)
Introduction to Computer ScienceFall 2018
Overview of a Cluster Analysis
12
![Page 13: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/13.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
13
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 14: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/14.jpg)
Introduction to Computer ScienceFall 2018
Steps to Automatization: Cluster Criteria
14
▪ Cohesion: how strong are the cluster objects connected (how similar, pairwise, to each other)?
▪ Separation: how well is a cluster separated from other clusters?
small within clusterdistance
large between clusterdistance
![Page 15: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/15.jpg)
Introduction to Computer ScienceFall 2018
Steps to Automatization: Cluster Criteria
15
▪ Cohesion: how strong are the cluster objects connected (how similar, pairwise, to each other)?
▪ Separation: how well is a cluster separated from other clusters?
small within clusterdistance
large between clusterdistance
• There exist many other criteria, e.g., areas with the same density.
• It is important to choose a criterion which fits the data!
![Page 16: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/16.jpg)
Introduction to Computer ScienceFall 2018
Optimization
16
▪ Partitional clustering algorithms partition a dataset into k clusters, typically minimizing some cost function▪ no overlaps
▪ all points must be part of a cluster
▪ (compactness criterion), i.e., optimizing cohesion.
![Page 17: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/17.jpg)
Introduction to Computer ScienceFall 2018
Assumptions for Partitioning Clustering
17
▪ Central assumptions for approaches in this family are typically:▪ number k of clusters known (i.e., given as input)
▪ clusters are characterized by their compactness
▪ compactness measured by some distance function (e.g., distance of all objects in a cluster from some cluster representative is minimal)
▪ criterion of compactness typically leads to convex or even spherically shaped clusters
![Page 18: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/18.jpg)
Introduction to Computer ScienceFall 2018
Construction of Central Points: Basics
18
▪ objects are points 𝑥 = 𝑥1, … , 𝑥𝑑 in Euclidean vector space ℝ𝑑
▪ dist = Euclidean distance (𝐿2)
▪ I centroid 𝜇𝐶: mean vector of all points in cluster 𝐶
![Page 19: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/19.jpg)
Introduction to Computer ScienceFall 2018
Construction of Central Points: Basics
19
▪ objects are points 𝑥 = 𝑥1, … , 𝑥𝑑 in Euclidean vector space ℝ𝑑
▪ dist = Euclidean distance (𝐿2)
▪ I centroid 𝜇𝐶: mean vector of all points in cluster 𝐶
![Page 20: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/20.jpg)
Introduction to Computer ScienceFall 2018
Cluster Criteria
20
▪ Measure for compactness:
𝑇𝐷2 𝐶 =
𝑝∈𝐶
𝑑𝑖𝑠𝑡 𝑝, 𝜇𝐶2
(sum of squares)
▪ Measure of compactnessfor a clustering:
𝑇𝐷2 𝐶1, … , 𝐶𝑘 =
𝑖=1
𝑘
𝑇𝐷2(𝐶𝑖)
![Page 21: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/21.jpg)
Introduction to Computer ScienceFall 2018
Cluster Criteria
21
▪ Measure for compactness:
𝑇𝐷2 𝐶 =
𝑝∈𝐶
𝑑𝑖𝑠𝑡 𝑝, 𝜇𝐶2
(sum of squares)
▪ Measure of compactnessfor a clustering:
𝑇𝐷2 𝐶1, … , 𝐶𝑘 =
𝑖=1
𝑘
𝑇𝐷2(𝐶𝑖)
![Page 22: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/22.jpg)
Introduction to Computer ScienceFall 2018
Basic Algorithm: Clustering by Minimization of Variance [Forgy, 1965, Lloyd, 1982]
22
▪ start with 𝑘 (e.g., randomly selected) points as cluster representatives (or with a random partition into 𝑘 “clusters”)
▪ repeat:1. assign each point to the closest representative
2. compute new representatives based on the given partitions (centroid of the assigned points)
▪ until there is no change in assignment
![Page 23: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/23.jpg)
Introduction to Computer ScienceFall 2018
𝑘-means
23
𝑘-means [MacQueen, 1967] is a variant of the basic algorithm:
▪ A centroid is immediately updated when some point changes its assignment
▪ 𝑘-means has very similar properties, but the result now depends on the order of data points in the input file
Note:
▪ The name “𝑘-means” is often used indifferently for any variant of the basic algorithm, in particular also for the Algorithm shown before [Forgy, 1965, Lloyd, 1982].
![Page 24: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/24.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
24
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 25: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/25.jpg)
Introduction to Computer ScienceFall 201825
k-means ClusteringLloyd/Forgy Algorithm
![Page 26: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/26.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
26
![Page 27: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/27.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
27
![Page 28: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/28.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
28
![Page 29: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/29.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
29
![Page 30: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/30.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
30
![Page 31: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/31.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
31
![Page 32: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/32.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
32
![Page 33: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/33.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Lloyd/Forgy Algorithm
33
![Page 34: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/34.jpg)
Introduction to Computer ScienceFall 201834
k-means ClusteringMacQueen Algorithm
![Page 35: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/35.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
35
![Page 36: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/36.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
36
![Page 37: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/37.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
37
![Page 38: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/38.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
38
![Page 39: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/39.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
39
![Page 40: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/40.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
40
![Page 41: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/41.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
41
![Page 42: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/42.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
42
![Page 43: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/43.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
43
![Page 44: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/44.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
44
![Page 45: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/45.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
45
![Page 46: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/46.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
46
![Page 47: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/47.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
47
![Page 48: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/48.jpg)
Introduction to Computer ScienceFall 201848
k-means ClusteringMacQueen Algorithm
Alternative Ordering
![Page 49: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/49.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
49
![Page 50: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/50.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
50
![Page 51: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/51.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
51
![Page 52: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/52.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
52
![Page 53: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/53.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
53
![Page 54: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/54.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
54
![Page 55: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/55.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
55
![Page 56: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/56.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
56
![Page 57: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/57.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
57
![Page 58: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/58.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
58
![Page 59: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/59.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
59
![Page 60: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/60.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – MacQueen Algorithm
60
![Page 61: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/61.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Quality
61
![Page 62: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/62.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Quality
62
![Page 63: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/63.jpg)
Introduction to Computer ScienceFall 2018
k-means Clustering – Quality
63
![Page 64: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/64.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
64
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 65: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/65.jpg)
Introduction to Computer ScienceFall 2018
k-means: Pros
65
▪ Efficient: 𝑂(𝑘 ⋅ 𝑛) per iteration, number of iterations is usually in the order of 10.
▪ Easy to implement, thus very popular
▪ Only one parameter, easy to understand
▪ Well understood and researched
▪ Different variants exists▪ Fuzzy clustering
▪ Variants without n-dimensional embedding
![Page 66: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/66.jpg)
Introduction to Computer ScienceFall 2018
k-means: Disadvantages
66
▪ k-means converges towards a local minimum
▪ k-means (MacQueen-variant) is order-dependent
▪ Deteriorates with noise and outliers (all points are used to compute centroids)
▪ Clusters need to be convex and of (more or less) equal extension
▪ Number k of clusters is hard to determine
▪ Strong dependency on initial partition (in result quality as well as runtime)
![Page 67: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/67.jpg)
Introduction to Computer ScienceFall 2018
What to do?
67
How can we tackle the initialization problem?
![Page 68: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/68.jpg)
Introduction to Computer ScienceFall 2018
What to do?
68
How can we tackle the initialization problem?
▪ Repeated runs
▪ Furthest-first initialization
▪ Subset Furthest-first initialization
![Page 69: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/69.jpg)
Introduction to Computer ScienceFall 2018
Insertion: Furthest First Initialization
69
▪ Select a random point as start
▪ For each point, the minimum of its distances to the selected centers is maintained.
▪ While less than 𝑘 points selected, repeat:▪ Selected point 𝑝 with the maximum distance to the existing centers
▪ Remove p from the not-yet-selected points and add it to the center points
▪ For each remaining not-yet-selected point q, replace the distance stored for q by the minimum of its old value and the distance from p to q.
![Page 70: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/70.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
70
![Page 71: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/71.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
71
![Page 72: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/72.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
72
![Page 73: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/73.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
73
![Page 74: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/74.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
74
Update the minimum distances
![Page 75: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/75.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
75
![Page 76: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/76.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
76
![Page 77: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/77.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
77
![Page 78: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/78.jpg)
Introduction to Computer ScienceFall 2018
Furthest First Initialization: Visualization
78
![Page 79: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/79.jpg)
Introduction to Computer ScienceFall 2018
Learnings of this Section
79
▪ What is Clustering?
▪ Basic idea for identifying “good” partitions into k clusters
▪ Selection of representative points
▪ Iterative refinement
▪ Local optimum
▪ k-means variants [Forgy, 1965, Lloyd, 1982, MacQueen, 1967]
▪ Different initialization methods
![Page 80: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/80.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
80
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 81: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/81.jpg)
Introduction to Computer ScienceFall 2018
Recall: Clustering as a Workflow
81
![Page 82: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/82.jpg)
Introduction to Computer ScienceFall 2018
Similarities
82
▪ Similarity (as given by some distance measure) is a central concept in data mining, e.g.:▪ Clustering: group similar objects in the same cluster, separate dissimilar
objects to different clusters
▪ Outlier detection: identify objects that are dissimilar (by some characteristic) from most other objects
▪ Definition of a suitable distance measure is often crucial for deriving a meaningful solution in the data mining task▪ Images
▪ CAD objects
▪ Proteins
▪ Texts
▪ . . .
![Page 83: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/83.jpg)
Introduction to Computer ScienceFall 2018
Spaces and Distance Functions
83
![Page 84: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/84.jpg)
Introduction to Computer ScienceFall 2018
Spaces and Distance Functions
84
![Page 85: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/85.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
85
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 86: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/86.jpg)
Introduction to Computer ScienceFall 2018
Categories of Feature Descriptors for Images
86
▪ Distribution of colors
▪ Texture
▪ Shapes (contours)
▪ Many more …
![Page 87: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/87.jpg)
Introduction to Computer ScienceFall 2018
Color Histogram
87
▪ A histogram represents the distribution of colors over the pixels of an image
▪ Definition of an color histogram:▪ Choose a color space (RGB, HSV, HLS, . . . )
▪ Choose number of representants (sample points) in the color space
▪ Possibly normalization (to account for different image sizes)
![Page 88: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/88.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
88
![Page 89: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/89.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
89
![Page 90: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/90.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
90
![Page 91: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/91.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
91
![Page 92: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/92.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
92
![Page 93: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/93.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
93
![Page 94: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/94.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
94
![Page 95: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/95.jpg)
Introduction to Computer ScienceFall 2018
Impact of Number of Representants
95
![Page 96: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/96.jpg)
Introduction to Computer ScienceFall 2018
Distances for Color Histograms
96
![Page 97: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/97.jpg)
Introduction to Computer ScienceFall 2018
Distances for Color Histograms
97
![Page 98: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/98.jpg)
Introduction to Computer ScienceFall 2018
Distances for Color Histograms
98
![Page 99: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/99.jpg)
Introduction to Computer ScienceFall 2018
Lecture ContentClustering & Feature Spaces
99
▪ Clustering
▪ Clustering in General
▪ Partitional Clustering
▪ Visualization: Algorithmic Differences
▪ Summary
▪ Feature Spaces
▪ Distances
▪ Features for Images
▪ Summary
![Page 100: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/100.jpg)
Introduction to Computer ScienceFall 2018
Your Choice of a Distance Measure
100
▪ There are hundreds of distance functions [Deza and Deza, 2009].▪ For time series: DTW, EDR, ERP, LCSS, . . .
▪ For texts: Cosine and normalizations
▪ For sets – based on intersection, union, . . . (Jaccard)
▪ For clusters (single-link, average-link, etc.)
▪ For histograms: histogram intersection, “Earth movers distance”, quadratic forms with color similarity
▪ For proteins: Edit distance, structure, …
▪ With normalization: Canberra, …
▪ Quadratic forms / bilinear forms: 𝑑(𝑥, 𝑦) ∶= 𝑥𝑇𝑀𝑦 for some positive (usually symmetric) definite matrix 𝑀.
![Page 101: Introduction to Computer Science - imada.sdu.dkrolf/Edu/DM534/E18/dm534_clustering.pdf · Fall 2018 Introduction to Computer Science Learning Objectives 3 Understand the problem of](https://reader030.vdocument.in/reader030/viewer/2022040207/5e0f28bae4e0c93c8316db74/html5/thumbnails/101.jpg)
Introduction to Computer ScienceFall 2018
Learnings of this Section
101
▪ Distances (𝐿𝑝-norms, weighted, quadratic form)
▪ Color histograms as feature (vector) descriptors for images
▪ Impact of the granularity of color histograms on similarity measures