machine learning cs 165b spring 2012 1. course outline introduction (ch. 1) concept learning (ch. 2)...

29
Machine Learning CS 165B Spring 2012 1

Upload: georgina-mclaughlin

Post on 14-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Machine LearningCS 165B

Spring 2012

1

Page 2: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Course outline

2

Page 3: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

• Organizing data into classes such that there is

• high intra-class similarity

• low inter-class similarity

• Finding the class labels and the number of classes directly from the data (in contrast to classification).

• More informally, finding natural groupings among objects.

What is clustering?

3

Remember Fischer’s discriminant

Page 4: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

What is a natural grouping among these objects?

4

Page 5: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

School Employees Simpson's Family Males Females

Clustering is subjectiveClustering is subjective

What is a natural grouping among these objects?

5

Page 6: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

What is similarity?The quality or state of being similar; likeness; resemblance; as, a similarity of features.

Similarity is hard to define, but… “We know it when we see it”

The real meaning of similarity is a philosophical question. We will take a more pragmatic approach.

Webster's Dictionary

6

Page 7: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Defining distance measures

Definition: Let O1 and O2 be two objects from the universe of

possible objects. The distance (dissimilarity) between O1 and

O2 is a real number denoted by D(O1,O2)

0.23 3 342.7

Peter Piotr

7

Page 8: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

What properties should a distance measure have?

• D(A,B) = D(B,A) Symmetry •D(A,B) = 0 iff A= B Reflexive• D(A,B) D(A,C) + D(B,C) Triangle Inequality

Peter Piotr

3

d('', '') = 0 d(s, '') = d('', s) = |s| -- i.e. length of s d(s1+ch1, s2+ch2) = min( d(s1, s2) + if ch1=ch2 then 0 else 1 fi, d(s1+ch1, s2) + 1, d(s1, s2+ch2) + 1 )

When we peek inside one of these black boxes, we see some function on two variables. These functions might very simple or very complex. In either case it is natural to ask, what properties should these functions have?

8

Page 9: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Two types of clustering

HierarchicalHierarchical

• Partitional algorithms: Construct various partitions and then evaluate them by some criterion• Hierarchical algorithms: Create a hierarchical decomposition of the set of objects using some criterion

PartitionalPartitional

9

Page 10: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Desirable Properties of clustering algorithm

• Scalability (in terms of both time and space)

• Ability to deal with different data types

• Minimal requirements for domain knowledge to determine input parameters

• Able to deal with noise and outliers

• Insensitive to order of input records

• Incorporation of user-specified constraints

• Interpretability and usability10

Page 11: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Summarizing similarity measurements

In order to better appreciate and evaluate the examples given in the early part of this talk, we will now introduce the dendrogram.

Root

Internal Branch

Terminal Branch

Leaf

Internal Node

Root

Internal Branch

Terminal Branch

Leaf

Internal Node

The similarity between two objects in a dendrogram is represented as the height of the lowest internal node they share.

11

Page 12: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Pedro (Portuguese)Petros (Greek), Peter (English), Piotr (Polish), Peadar (Irish), Pierre (French), Peder (Danish), Peka (Hawaiian), Pietro (Italian), Piero (Italian Alternative), Petr (Czech), Pyotr (Russian)

Cristovao (Portuguese)Christoph (German), Christophe (French), Cristobal (Spanish), Cristoforo (Italian), Kristoffer (Scandinavian), Krystof (Czech), Christopher (English)

Miguel (Portuguese)Michalis (Greek), Michael (English), Mick (Irish!)

Hierarchical clustering using string edit distance P

iotr

Pyo

tr P

etro

s P

ietr

oP

edro

Pie

rre

Pie

ro P

eter

Ped

e r P

eka

Pea

dar

Mic

halis

Mic

hael

Mig

uel

Mic

kC

risto

vao

Chr

isto

pher

Chr

isto

phe

Chr

isto

phC

risde

anC

risto

bal

Cris

tofo

roK

risto

ffer

Kry

stof

12

Page 13: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

(How-to) Hierarchical clustering

The number of dendrograms with n leafs = (2n -3)!/[(2(n -2)) (n -2)!]

Number Number of Possibleof Leafs Dendrograms 2 13 34 155 105... …10 34,459,425

Since we cannot test all possible trees we will have to use heuristics:

Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

Top-Down (divisive): Starting with all the data in a single cluster, consider every possible way to divide the cluster into two. Choose the best division and recursively operate on both sides.

13

Page 14: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

0 8 8 7 7

0 2 4 4

0 3 3

0 1

0

D( , ) = 8

D( , ) = 1

We begin with a distance matrix which contains the distances between every pair of objects in our database.

14

Distance matrix

Page 15: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

…Consider all possible merges…

Choose the best

15

Bottom-Up (agglomerative)

Page 16: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

…Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

16

Bottom-Up (agglomerative)

Page 17: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

…Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

Consider all possible merges…

Choose the best…

17

Bottom-Up (agglomerative)

Page 18: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.

…Consider all possible merges…

Choose the best

Consider all possible merges… …

Choose the best

Consider all possible merges…

Choose the best…

18

Bottom-Up (agglomerative)

Page 19: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

We know how to measure the distance between two objects, but defining the distance between an object and a cluster, or defining the distance between two clusters is not obvious. • Single linkage (nearest neighbor):Single linkage (nearest neighbor): In this method the distance between two clusters is determined by the distance of the two closest objects (nearest neighbors) in the different clusters.• Complete linkage (farthest neighbor):Complete linkage (farthest neighbor): In this method, the distances between clusters are determined by the greatest distance between any two objects in the different clusters (i.e., by the "furthest neighbors"). • Group average linkageGroup average linkage: In this method, the distance between two clusters is calculated as the average distance between all pairs of objects in the two different clusters.

19

Extending distance measure to clusters

Page 20: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

• No need to specify the number of clusters in advance. • Hierarchal nature maps nicely onto human intuition for some domains• They do not scale well: time complexity of at least O(n2), where n is the number of total objects.• Like any heuristic search algorithms, local optima are a problem.

20

Summary of hierarchal clustering

Page 21: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Partitional clustering

• Nonhierarchical, each instance is placed in exactly one of K nonoverlapping clusters.

• Since only one set of clusters is output, the user normally has to input the desired number of clusters K.

21

Page 22: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

1. Decide on a value for k.

2. Initialize the k cluster centers (randomly, if necessary).

3. Decide the class memberships of the N objects by assigning them to the nearest cluster center.

4. Re-estimate the k cluster centers, by assuming the memberships found above are correct.

5. If none of the N objects changed membership in the last iteration, exit. Otherwise goto 3. 22

Algorithm k-means

Page 23: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

0

1

2

3

4

5

0 1 2 3 4 5

K-means clustering: step 1Algorithm: k-means, Distance Metric: Euclidean Distance

k1

k2

k3

23

Page 24: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

0

1

2

3

4

5

0 1 2 3 4 5

K-means clustering: step 2

Algorithm: k-means, Distance Metric: Euclidean Distance

k1

k2

k3

24

Page 25: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

0

1

2

3

4

5

0 1 2 3 4 5

K-means clustering: step 3Algorithm: k-means, Distance Metric: Euclidean Distance

k1

k2

k3

25

Page 26: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

0

1

2

3

4

5

0 1 2 3 4 5

K-means clustering: step 4Algorithm: k-means, Distance Metric: Euclidean Distance

k1

k2

k3

26

Page 27: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

K-means clustering: step 5Algorithm: k-means, Distance Metric: Euclidean Distance

k1

k2k3

27

Page 28: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Evaluation of K-means

• Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters,

and t is # iterations. Normally, k, t << n.– Often terminates at a local optimum. The global optimum may

be found using techniques such as: deterministic annealing and genetic algorithms

• Weakness– Applicable only when mean is defined, then what about

categorical data?– Need to specify k, the number of clusters, in advance– Unable to handle noisy data and outliers– Not suitable for clusters with non-convex shapes

28

Page 29: Machine Learning CS 165B Spring 2012 1. Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks

Other clustering methods

• Density based clustering• Subspace clustering• EM clustering• Gaussian Mixture Models• Other concepts

• Dimensionality reduction• PCA

29