a intelligence (cscu9ye ) l 1: revision lecture · exam ¢ date: thursday 15 december, 14:00 –...

47
ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 1: REVISION LECTURE Gabriela Ochoa, Nadarajen Veerapen, Fabio Daolio

Upload: buitram

Post on 08-Sep-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

ARTIFICIAL INTELLIGENCE (CSCU9YE ) LECTURE 1: REVISION LECTURE

Gabriela Ochoa, Nadarajen Veerapen, Fabio Daolio

Page 2: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

EXAM

¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam

¢ Attempt BOTH questions. �  Q1: Search (25 Marks) �  Q2: Machine Learning (25 Marks)

¢ The distribution of marks among the parts of

each question is indicated.

2

Page 3: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

SOLVING PROBLEMS BY SEARCHING ¢ Problem-solving agents decide what to do by finding

sequences of actions that lead to desirable states ¢ What is a problem and what is a solution?

�  Problem: a goal and a set of means to achieve it �  Solution: a sequence of actions to achieve that

goal

¢ Given a precise definition of problem, it is possible to construct a search process for finding solutions

3

Page 4: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

EXAMPLE: ROMANIA Google Map: Romania

4

Page 5: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

PROBLEM FORMULATION More formally, a problem is defined by these main components: 1.  Initial state where the agent starts: e.g., "at Arad"

2.  Actions available to the agent �  e.g., Arad à Zerind , Arad à Sibiu, … etc.

3.  Goal test, determines whether a given state is a goal state. �  explicit, e.g., x = "at Bucharest“ �  implicit, e.g., Checkmate(x)

4.  Path cost (additive) function that assigns a numeric cost to each path. Reflects agents performance measure �  e.g., sum of distances, number of actions executed, etc. �  c(x,a,y) is the step cost, assumed to be ≥ 0

5

Page 6: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

SEARCH ALGORITHMS

¢ Uninformed search strategies: can find solutions to problems by systematically generating new states, and testing them against the goal (eg. BFS and DFS)

¢  Informed search strategies: use some problem-specific knowledge

¢ Knowledge is given by an evaluation function that returns a number describing the desirability (or lack thereof) of expanding a nodes: Examples: Best-first search, Greedy Search, A*

6

Page 7: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Shaded nodes: expanded nodes

Outlined nodes: generated but not expanded

7

Page 8: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

BREADTH-FIRST SEARCH

¢ Expand shallowest unexpanded node ¢  Implementation:

�  Frontier is a FIFO queue, i.e., new successors go at end

8

Page 9: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

BREADTH-FIRST SEARCH

¢ Expand shallowest unexpanded node ¢  Implementation:

�  Frontier is a FIFO queue, i.e., new successors go at end

9

Page 10: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

BREADTH-FIRST SEARCH

¢ Expand shallowest unexpanded node ¢  Implementation:

�  fringe is a FIFO queue, i.e., new successors go at end

10

Page 11: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

BREADTH-FIRST SEARCH

¢ Expand shallowest unexpanded node ¢  Implementation:

�  fringe is a FIFO queue, i.e., new successors go at end

11

Page 12: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

OPTIMISATION PROBLEMS ARE EVERYWHERE!

Logistics, transportation, supply change management

Manufacturing, production lines

Timetabling

Cutting & packing Computer networks and Telecommunications

Software - SBSE 12

Page 13: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

HILL-CLIMBING SEARCH Like climbing a mountain in thick fog with amnesia

13

Best Improvement (gradient descent, greedy hill-climbing): Choose maximally improving neighbour First Improvement: Choose the first found improving move. Local optimum: No other solution in the neighbourhood has better fitness

Page 14: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

HILL-CLIMBING SEARCH Problem: depending on initial state, can get stuck in local maxima

14

Page 15: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

ITERATED LOCAL SEARCH

Procedure Iterated Local Search (ILS) s = initislise(s) s = hill-climbing (s) while NOT termination_criterion { r = s s = perturbation(s) s = hill-climbing (s) if s < r

s = r }

¢  Key idea: use two stages �  Local search for reaching local optima (intensification) �  Perturbation stage, for escaping local optima (diversification)

¢  Acceptance criterion: to control diversification vs. intensificaction

15

Page 16: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Artificial Intelligence (CSC9YE)Revision - Machine Learning - Decision Trees

Fabio [email protected]

Page 17: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Definitionfrom (T. Mitchell 1997)

“A computer program is said to learn from experience E

with respect to some class of tasks T and performance

measure P, if its performance at tasks in T, as measured

by P, improves with experience E.”

1 / 18

Page 18: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Learning Paradigmswhat kind of experience, what kind of tasks?

Supervised Learning: the program is presented with a series ofinput-output examples and learns a function thatmaps inputs to outputs.

I regressionI classification

Unsupervised Learning: the program is presented with a series ofinputs and learns how they are organised.

I clustering (or segmentation)I dimensionality reduction

Reinforcement Learning: the program learns to determine the idealbehaviour based on feedback from the environment,rewards or punishments.

I game playingI on-line control

2 / 18

Page 19: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Supervised Learning settingoutcome measurements and predictors measurements are available

< x11, x12, … , x1p >

< x21, x22, … , x2p >

< x31, x32, … , x3p >

...

...

...< xn1, xn2, … , xnp >

y1

y2

y3

...

...

...yn

X y Data: list of observations in the formL = {< X , y >}

X

n⇥p feature matrix / design matrixn samples / examples / instancesp features / predictors / covariates

y

n⇥1 target vector / labelsI regression: continuous valuesI classification: finite set of types

Problem: learn y = f (X )

3 / 18

Page 20: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

A Binary Classification Task< x

1

, x2

>2 R features, < y >2 {class1, class2} labels, n = 30.How to automatically find a mapping f from (x

1

, x2

) to y?

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x1

x 2

y class1 class2

4 / 18

Page 21: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Base model: predict the majority classminimise misclassification error

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x1

x 2

y class1 class2

class122 8

5 / 18

Page 22: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Divide and Conquerrecursive partition and assign a base model to each partition

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x1

x 2

y class1 class2

x2 < 0.43

x2 >= 0.77

class122 8

class113 0

class19 8

class17 0

class22 8

yes no

6 / 18

Page 23: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Divide and Conquerrecursive partition and assign a base model to each partition

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x1

x 2

y class1 class2

x2 < 0.43

x2 >= 0.77

class122 8

class113 0

class19 8

class17 0

class22 8

yes no

7 / 18

Page 24: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Divide and Conquerrecursive partition and assign a base model to each partition

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00x1

x 2

y class1 class2

x2 < 0.43

x2 >= 0.77

x1 >= 0.7

class122 8

class113 0

class19 8

class17 0

class22 8

class12 0

class20 8

yes no

8 / 18

Page 25: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Decision Tree: recursive binary splittingthings to notice

I the target y is approximated by a piecewise constant function

I the feature space X is partitioned into disjoint regions

I the goal is to find partitions that minimise the prediction error

I it is computationally infeasible to consider all possiblepartitions

I the recursive binary splitting is a top-down, greedy procedure:I splits are defined by a split variable and a split pointI at any step, all possible splits in the data are testedI the split that yields the most “pure” nodes is chosen

I the splitting could continue until all nodes are “pure”...

9 / 18

Page 26: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Tree Building Algorithmcode from (G. Louppe 2014)

function BuildDecisionTree(L)Create node t

if the stopping criterion is met for t then

Assign a model to byt

else

Find the split on L that maximizes impurity decrease

s

⇤ = argmaxs

i(t) � pLi(tsL) � pR i(t

sR)

Partition L into LtL [ LtR according to s

tL = BuildDecisionTree(LtL)tR = BuildDecisionTree(LtR )

end if

return t

end function

10 / 18

Page 27: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Measuring Nodes Impurityfor a binary classification task, figure from (Hastie et al. 2009)

9.2 Tree-Based Methods 309

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.1

0.2

0.3

0.4

0.5

p

Entropy

Gini ind

ex

Misclas

sifica

tion e

rror

FIGURE 9.3. Node impurity measures for two-class classification, as a functionof the proportion p in class 2. Cross-entropy has been scaled to pass through(0.5, 0.5).

impurity measure Qm(T ) defined in (9.15), but this is not suitable forclassification. In a node m, representing a region Rm with Nm observations,let

pmk =1

Nm

xi�Rm

I(yi = k),

the proportion of class k observations in node m. We classify the obser-vations in node m to class k(m) = arg maxk pmk, the majority class innode m. Di�erent measures Qm(T ) of node impurity include the following:

Misclassification error: 1Nm

Pi�Rm

I(yi �= k(m)) = 1 � pmk(m).

Gini index:P

k �=k� pmkpmk� =PK

k=1 pmk(1 � pmk).

Cross-entropy or deviance: �PK

k=1 pmk log pmk.(9.17)

For two classes, if p is the proportion in the second class, these three mea-sures are 1 � max(p, 1 � p), 2p(1 � p) and �p log p � (1 � p) log (1 � p),respectively. They are shown in Figure 9.3. All three are similar, but cross-entropy and the Gini index are di�erentiable, and hence more amenable tonumerical optimization. Comparing (9.13) and (9.15), we see that we needto weight the node impurity measures by the number NmL and NmR ofobservations in the two child nodes created by splitting node m.

In addition, cross-entropy and the Gini index are more sensitive to changesin the node probabilities than the misclassification rate. For example, ina two-class problem with 400 observations in each class (denote this by(400, 400)), suppose one split created nodes (300, 100) and (100, 300), while

I If p is the proportion of samples of the other class in node t:

Misclassification Rate: i(t) = p � max (p, 1 � p)Gini Index: i(t) = 2p(1 � p)

Cross-Entropy: i(t) = �p log(p)�(1�p) log(1�p)2 log(2)

11 / 18

Page 28: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Classification And Regression Trees

By swapping impurity function and leaf model, decision trees canbe used to solve classification and regression tasks:

classification:

Iy symbolic, discrete, e.g., Y = {class1, class2}

Iy = argmaxc2Y p(c |t), i.e. the majority class in node t

Ii(t) = entropy(t) or i(t) = gini(t)

regression:

Iy numeric, continuous

Iy = mean(y |t), i.e. the point average in node t

Ii(t) = 1

nt

Px,y2Lt

(y � byt)2, i.e. the mean squared error

12 / 18

Page 29: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

A Simple Regression Tree

< x , y > continuous variables, n = 20

10

20

30

40

0 100 200 300 400 500x

y

x < 418

x >= 154 x < 460

19.5n=20

14.8n=14

11.7n=9

20.5n=5

30.5n=6

24.4n=3

36.5n=3

yes no

13 / 18

Page 30: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Model Selection on tree parametersdepth=1 depth=2

10

20

30

40

0 100 200 300 400 500x

y

10

20

30

40

0 100 200 300 400 500x

y

depth=3 depth=4

10

20

30

40

0 100 200 300 400 500x

y

10

20

30

40

0 100 200 300 400 500x

y

14 / 18

Page 31: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Stopping condition: e.g., max depth or min samplesdepth=1 depth=2

x < 418

19.5n=20

14.8n=14

30.5n=6

yes no

x < 418

x >= 154 x < 460

19.5n=20

14.8n=14

11.7n=9

20.5n=5

30.5n=6

24.4n=3

36.5n=3

yes no

depth=3 depth=4

x < 418

x >= 154

x < 366 x >= 37.1

x < 460

x >= 444

19.5n=20

14.8n=14

11.7n=9

10.6n=8

20.5n=1

20.5n=5

18.4n=3

23.7n=2

30.5n=6

24.4n=3

22.3n=2

28.5n=1

36.5n=3

yes no

x < 418

x >= 154

x < 366 x >= 37.1

x < 21.9

x < 460

x >= 444 x < 474

x >= 478

19.5n=20

14.8n=14

11.7n=9

10.6n=8

20.5n=1

20.5n=5

18.4n=3

23.7n=2

20.4n=1

27n=1

30.5n=6

24.4n=3

22.3n=2

28.5n=1

36.5n=3

33.3n=1

38.1n=2

33.7n=1

42.6n=1

yes no

15 / 18

Page 32: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Recall: Underfitting and Overfittingthe goal of the model is to minimise the prediction error on unseen data

10

20

30

1 2 3 4tree maximum depth

MSE

settesttrain

I Overly complex trees are likely to overfit the training data:I to avoid this, tune the stopping criteria (or post-hoc prune)I cross-validation can be used for model selection

16 / 18

Page 33: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Recall: Bias and Variancemodels with low bias and low variance have lower expected prediction error

Low Bias

Low Variance

••••••••

High Variance

••

• ••

High Bias••••••••

••

••

•• •

17 / 18

Page 34: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Bias and Variance of a Regression Tree

I Decision trees have, in general, low bias but high variance:I to reduce variance, combine the predictions of several trees!

(see bagging and ensembles of randomised trees)

18 / 18

Page 35: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

References

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013).

An Introduction to Statistical Learning: with Applications in R.

Springer.

Louppe, G. (2014).

Understanding Random Forests: From Theory to Practice.

PhD thesis, Universite de Liege, Liege, Belgique.

Page 36: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Artificial Intelligence (CSC9YE)Revision – Machine Learning – Clustering

Page 37: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Unsupervised Learning

I Unsupervised learning: no labeled examples, no training set.

I We want to find interesting things about a set of data. Isthere an informative way to visualize the data? Can wediscover subgroups among the variables or among theobservations?

I This means grouping and separating data points at the sametime.

I We need a way to measure how (dis)similar the data pointsare, for example with the Euclidean distance.

I It is intrinsically more di�cult than supervised learningbecause there is no gold standard (like an outcome variable)and no single objective (like test set accuracy).

1 / 11

Page 38: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Two Clustering Methods

I In K-means clustering, we seek to partition the observationsinto a pre-specified number of clusters k .

I In hierarchical clustering, we do not know in advance howmany clusters we want; in fact, we end up with a tree-likevisual representation of the observations, called a dendrogram,that allows us to view at once the clusterings obtained foreach possible number of clusters, from 1 to n.

2 / 11

Page 39: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

K-means: An Optimisation Problem

I Minimise within-cluster variation.I Algorithm:

1. Randomly select k points. These serve as initial clustercentroids for the observations.

2. Assign each observation to the cluster whose centroid isclosest.

3. Iterate until the cluster assignments stop changing:

3.1 For each of the k clusters, compute the cluster centroid. The

k thcluster centroid is the vector of the p feature means for

the observations in the k thcluster.

3.2 Assign each observation to the cluster whose centroid is

closest.

I Properties:I This algorithm is guaranteed to decrease the value of the

objective. However it is not guaranteed to give the globalminimum.

I The algorithm may get stuck in a local optimum.

3 / 11

Page 40: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

K-means AlgorithmK-means with k=2, randomly choose centroids in initial step

0 2 4 6 8

02

46

8 1

2

3

4

5 6

7

8

9

10

Distancesc1 c2

1 6.08 5.39

2 5.10 5.10

3 4.24 3.16

4 2.24 5.39

5 1.00 6.40

6 0.00 7.21

7 7.28 6.08

8 6.08 5.00

9 8.06 3.61

10 7.21 0.00

4 / 11

Page 41: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

K-means AlgorithmLast step: no change in centroids

0 2 4 6 8

02

46

8 1

2

3

4

5 6

7

8

9

10

c1

c2

Distancesc1 c2

1 3.34 7.96

2 2.34 7.30

3 1.86 4.51

4 0.69 5.94

5 2.67 5.77

6 2.91 6.77

7 7.47 2.51

8 6.07 1.68

9 7.03 1.35

10 5.01 3.58

5 / 11

Page 42: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical Clustering

I Hierarchical clustering does not require that we commit to aparticular choice of k .

I Bottom-up or agglomerative clustering: a dendrogram (a tree)is built starting from the leaves and combining clusters up tothe trunk.

I Algorithm:

1. Start with each point in its own cluster.2. Repeat until all points are in a single cluster.

IIdentify the closest two clusters and merge them.

I Similarity between clusters: for single/complete/averagelinkage, compute all pairwise distances between theobservations in cluster A and the observations in cluster B,and record the smallest/largest/average of these distances.

6 / 11

Page 43: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical ClusteringExample using Single Linkage

0 2 4 6 8

02

46

8 1

2

3

4

5 6

7

8

9

10

7 / 11

Page 44: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical Clustering

Distance Matrix1 2 3 4 5 6 7 8 9 10

1 0.02 1.0 0.03 3.6 2.8 0.04 4.0 3.0 2.2 0.05 6.0 5.0 3.6 2.0 0.06 6.1 5.1 4.2 2.2 1.0 0.07 10.0 9.2 6.4 7.2 6.3 7.3 0.08 8.6 7.8 5.0 5.8 5.1 6.1 1.4 0.09 8.6 8.1 5.4 7.1 7.1 8.1 3.2 2.8 0.010 5.4 5.1 3.2 5.4 6.4 7.2 6.1 5.0 3.6 0.0

Clusters: {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}, {10}

8 / 11

Page 45: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical Clusteringd k Clusters Comment

0.0 10 {1}, {2}, {3}, {4}, {5},{6}, {7}, {8}, {9}, {10}

Start with each observation as one

cluster.

1.0 8 {1, 2}, {3}, {4}, {5, 6},{7}, {8}, {9}, {10}

Merge {1} and {2} as well as {5}and {6} since they are the closest:

d(1,2)=1 and d(5,6)=1

1.4 7 {1, 2}, {3}, {4}, {5, 6},{7, 8}, {9}, {10}

Merge {7} and {8} since they are the

closest: d(7,8)=1.4

2.0 6 {1, 2}, {3}, {4, 5, 6},{7, 8}, {9}, {10}

Merge {4} and {5, 6} since 4 and 5

are the closest: d(4,5)=2.0

2.2 5 {1, 2}, {3, 4, 5, 6},{7, 8}, {9}, {10}

Merge {3} and {4, 5, 6} since 3 and

4 are the closest: d(3,4)=2.2

2.8 3 {1, 2, 3, 4, 5, 6},{7, 8, 9}, {10}

Merge {1, 2} and {3, 4, 5, 6} as

well as {7, 8} and {9} since 2 and

3 as well as 8 and 9 are the closest:

d(2,3)=2.8 and d(8,9)=2.8

3.2 2 {1, 2, 3, 4, 5, 6, 10},{7, 8, 9}

Merge {1, 2, 3, 4, 5, 6} and

{10} since 3 and 10 are the closest:

d(3,10)=3.2

3.6 1 {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} Merge remaining two clusters,

d(9,10)=3.6

9 / 11

Page 46: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical Clustering

0 2 4 6 8

02

46

8 1

2

3

4

5 6

7

8

9

10

10 / 11

Page 47: A INTELLIGENCE (CSCU9YE ) L 1: REVISION LECTURE · EXAM ¢ Date: Thursday 15 December, 14:00 – 15:30 (Room: 2A13) 1.5 hour exam ¢ Attempt BOTH questions. Q1: Search (25 Marks)

Hierarchical Clustering

9

7 8

10

1 2

3

4

5 6

1.0

1.5

2.0

2.5

3.0

3.5

Single Linkage Cluster DendrogramH

eigh

t

11 / 11