7.mv - cluster

Upload: rochana-ramanayaka

Post on 04-Jun-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 7.MV - Cluster

    1/15

    December 25, 2013

    Application of Multivariate Statistical Methods inMarketing Research

    Industrial Statistics

    MS3001Advanced Marketing Research

    Faculty of ScienceUniversity of Colombo

    Session 4

    Cluster Analysis

  • 8/13/2019 7.MV - Cluster

    2/15

    December 25, 2013

    Illustration

    I need to identify groups of target consumers who are similar in

    buying habits, demographic characteristics, or psychographics.

    Can districts of Sri Lanka be grouped based on demographics,

    socio-cultural parameters, agricultural operations, extent of

    development in infrastructure etc?

    Cluster Analysis

  • 8/13/2019 7.MV - Cluster

    3/15

    December 25, 2013

    Cluster Analysis

    In simple terms, Cluster Analysis does to objects orentities what Factor Analysis does to variables.Cluster Analysis groups objects based on a set of variables.

    The groups would be relatively homogenous within and

    heterogeneous across.

    A range of Clustering procedures:Hierarchical

    Each cluster (starting with the whole dataset) is divided into two,

    then divided again, and so on

    K-Means No. of clusters are subjectively input by the researcher.

  • 8/13/2019 7.MV - Cluster

    4/15

  • 8/13/2019 7.MV - Cluster

    5/15

    December 25, 2013

    What is Cluster Analysis?

    Cluster: a collection of data objects Similar to the objects in the same cluster (Intraclass similarity) Dissimilar to the objects in other clusters (Interclass dissimilarity)

    Cluster analysis Statistical method for grouping a set of data objects into clusters

    A good clustering method produces high quality clusters with high intraclass similarity

    and low interclass similarity

    Clustering is unsupervised classification Can be a stand-alone tool or as a preprocessing step for other algorithms

  • 8/13/2019 7.MV - Cluster

    6/15

    December 25, 2013

    Group objects according to their similarity

    Cluster:

    a set of objects

    that are similar

    to each other

    and separated

    from the other

    objects.

    Example: green/

    red data points

    were generated

    from two differentnormal distributions

  • 8/13/2019 7.MV - Cluster

    7/15

    December 25, 2013

    K-MeansClustering

    The meaning of K-meansWhy it is called K-means clustering: K points are used to represent

    the clustering result; each point corresponds to the centre (mean)

    of a cluster

    Each point is assigned to the cluster with the closest

    center point

    The number K, must be specified

    Basic algorithm

  • 8/13/2019 7.MV - Cluster

    8/15

    December 25, 2013

    The K-MeansClustering Method

    Given k, the k-meansalgorithm is implemented in 4 steps: Partition objects into knon-empty subsetsArbitrarily choose kpoints as initial centersAssign each object to the cluster with the nearest seed point (center) Calculate the mean of the cluster and update the seed point Go back to Step 3, stop when no more new assignment

    The basic step of k-means clustering is simple: Iterate until stable (= no object move group):

    Determine the centroid coordinate Determine the distance of each object to the centroids Group the object based on minimum distance

  • 8/13/2019 7.MV - Cluster

    9/15

    December 25, 2013

    The K-MeansClustering Results Example

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0 1 2 3 4 5 6 7 8 9 10

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0 1 2 3 4 5 6 7 8 9 10

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0 1 2 3 4 5 6 7 8 9 10

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0 1 2 3 4 5 6 7 8 9 10

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    0 1 2 3 4 5 6 7 8 9 10

    K=2

    Arbitrarily choose K

    object as initialcluster center

    Assigneachobjectsto mostsimilarcenter

    Updatethe

    clustermeans

    Updatetheclustermeans

    reassignreassign

  • 8/13/2019 7.MV - Cluster

    10/15

    December 25, 2013

    Weaknesses of the K-MeansMethod

    Unable to handle noisy data and outliers Very large or very small values could skew the mean

  • 8/13/2019 7.MV - Cluster

    11/15

    December 25, 2013

    Hierarchical Clustering

    Start with every data point in a separate cluster Keep merging the most similar pairs of data

    points/clusters until we have one big cluster left

    This is called a bottom-up or agglomerative method

  • 8/13/2019 7.MV - Cluster

    12/15

    December 25, 2013

    Hierarchical Clustering (cont.)

    This produces a binary tree or

    dendrogram

    The final cluster is the root and

    each data item is a leaf

    The height of the bars indicate

    how close the items are

  • 8/13/2019 7.MV - Cluster

    13/15

    December 25, 2013

    Hierarchical Clustering Demo

  • 8/13/2019 7.MV - Cluster

    14/15

    December 25, 2013

    Strengths & Weakness of HierarchicalClustering Methods

    Major advantage Conceptually very simple

    Easy to implement most commonly used technique

  • 8/13/2019 7.MV - Cluster

    15/15

    December 25, 2013

    Applications

    Market segmentation is usually conducted using someform of cluster analysis to divide people into segments

    Other methods such as latent class models or archetypal

    analysis are sometimes used instead

    It is also possible to cluster other items such as

    products/SKUs, image attributes, brands