7.mv - cluster

8/13/2019 7.MV - Cluster

1/15

December 25, 2013

Application of Multivariate Statistical Methods inMarketing Research

Industrial Statistics

MS3001Advanced Marketing Research

Faculty of ScienceUniversity of Colombo

Session 4

Cluster Analysis

8/13/2019 7.MV - Cluster

2/15

December 25, 2013

Illustration

I need to identify groups of target consumers who are similar in

buying habits, demographic characteristics, or psychographics.

Can districts of Sri Lanka be grouped based on demographics,

socio-cultural parameters, agricultural operations, extent of

development in infrastructure etc?

Cluster Analysis

8/13/2019 7.MV - Cluster

3/15

December 25, 2013

Cluster Analysis

In simple terms, Cluster Analysis does to objects orentities what Factor Analysis does to variables.Cluster Analysis groups objects based on a set of variables.

The groups would be relatively homogenous within and

heterogeneous across.

A range of Clustering procedures:Hierarchical

Each cluster (starting with the whole dataset) is divided into two,

then divided again, and so on

K-Means No. of clusters are subjectively input by the researcher.

8/13/2019 7.MV - Cluster

4/15

8/13/2019 7.MV - Cluster

5/15

December 25, 2013

What is Cluster Analysis?

Cluster: a collection of data objects Similar to the objects in the same cluster (Intraclass similarity) Dissimilar to the objects in other clusters (Interclass dissimilarity)

Cluster analysis Statistical method for grouping a set of data objects into clusters

A good clustering method produces high quality clusters with high intraclass similarity

and low interclass similarity

Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms

8/13/2019 7.MV - Cluster

6/15

December 25, 2013

Group objects according to their similarity

Cluster:

a set of objects

that are similar

to each other

and separated

from the other

objects.

Example: green/

red data points

were generated

from two differentnormal distributions

8/13/2019 7.MV - Cluster

7/15

December 25, 2013

K-MeansClustering

The meaning of K-meansWhy it is called K-means clustering: K points are used to represent

the clustering result; each point corresponds to the centre (mean)

of a cluster

Each point is assigned to the cluster with the closest

center point

The number K, must be specified

Basic algorithm

8/13/2019 7.MV - Cluster

8/15

December 25, 2013

The K-MeansClustering Method

Given k, the k-meansalgorithm is implemented in 4 steps: Partition objects into knon-empty subsetsArbitrarily choose kpoints as initial centersAssign each object to the cluster with the nearest seed point (center) Calculate the mean of the cluster and update the seed point Go back to Step 3, stop when no more new assignment

The basic step of k-means clustering is simple: Iterate until stable (= no object move group):

Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance

8/13/2019 7.MV - Cluster

9/15

December 25, 2013

The K-MeansClustering Results Example

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

10

0 1 2 3 4 5 6 7 8 9 10

K=2

Arbitrarily choose K

object as initialcluster center

Assigneachobjectsto mostsimilarcenter

Updatethe

clustermeans

Updatetheclustermeans

reassignreassign

8/13/2019 7.MV - Cluster

10/15

December 25, 2013

Weaknesses of the K-MeansMethod

Unable to handle noisy data and outliers Very large or very small values could skew the mean

8/13/2019 7.MV - Cluster

11/15

December 25, 2013

Hierarchical Clustering

Start with every data point in a separate cluster Keep merging the most similar pairs of data

points/clusters until we have one big cluster left

This is called a bottom-up or agglomerative method

8/13/2019 7.MV - Cluster

12/15

December 25, 2013

Hierarchical Clustering (cont.)

This produces a binary tree or

dendrogram

The final cluster is the root and

each data item is a leaf

The height of the bars indicate

how close the items are

8/13/2019 7.MV - Cluster

13/15

December 25, 2013

Hierarchical Clustering Demo

8/13/2019 7.MV - Cluster

14/15

December 25, 2013

Strengths & Weakness of HierarchicalClustering Methods

Major advantage Conceptually very simple

Easy to implement most commonly used technique

8/13/2019 7.MV - Cluster

15/15

December 25, 2013

Applications

Market segmentation is usually conducted using someform of cluster analysis to divide people into segments

Other methods such as latent class models or archetypal

analysis are sometimes used instead

It is also possible to cluster other items such as

products/SKUs, image attributes, brands

7.mv - cluster

Documents