cluster analysis measuring latent groups. cluster analysis - discussion definition vocabulary simple...

14
Cluster Analysis Measuring latent groups

Upload: shirley-brawn

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Cluster Analysis

Measuring latent groups

Cluster Analysis - Discussion

DefinitionVocabularySimple ProcedureSPSS example ICPSR and hands on

Definition

Cluster analysis is a process by which we take a large number of cases (read that observations across respondents) and reduce them into a smaller number of mutually exclusive “groups”, by “clustering” the shared variation among respondents across variables. The result is a “grouping” for each case across all variables.

Vocabulary and Procedure

There are essentially two steps in Cluster Analysis:

1. First is to create a table of relative similarities or differences between all objects. The table of relative similarities is called a proximities matrix.

2. Use this information to combine the objects into groups. The method of combining objects into groups is called a clustering algorithm. The idea is to combine objects that are similar to one another into the same group.

Vocabulary and Procedure (cont.)

In this respect, cluster analysis is the obverse of factor analysis. Whereas factor analysis reduces the number of variables by grouping them into a smaller set of factors, cluster analysis reduces the number of observations or cases by grouping them into a smaller set of clusters.

The obvious challenge is to determine which variables to include across observations and how to combine such variables, once they are chosen.

Proximities Matrix

X1 X2 X3C1 12 25 33C2 16 40 22C3 18 60 55C4 14 65 27C5 16 45 67

Proximities Matrix (cont.)

C1 C2 C3 C4 C5C1 0 16 36 4 16C2 16 0 4 4 0C3 36 4 0 16 4C4 4 4 16 0 0C5 16 0 4 0 0

Clustering – Flat Method

There are two types of clustering methods—flat and hierarchical. If the number of groups is known beforehand, the "flat" method works. In SPSS, this is called K-means clustering. Using this method, the objects are assigned to a given group at the first step based on some initial criterion. The means for each group are then calculated. The next step reshuffles the objects into groups, assigning objects to groups based on the object's similarity to the current mean of that group. The means of the groups are recalculated at the end of this step. This process continues recursively until no objects change groups.

Clustering – Hierarchical Method

If the groups are not known a priori, hierarchical clustering works better. There are two kinds:

Divisive – Starts with all observations in one groups and continues to divide into subgroups until no further distinction can be made.

Agglormerative – starts with each observation as a separate group and continues to pair observations until all groups are formed.

Steps in the Analysis

Input the data Choose the method for grouping Generate the Output Interpret the results

Input the data

Generate the Procedure

Produce the Output

Produce the Output (cont.)