basics of clustering
TRANSCRIPT
Cluster AnalysisFor segmentation
Clustering
what is it? why do we use it? how do we do it?
What is it?• Cluster analysis is the
process of grouping a set of data into clusters.
• A cluster is a collection of data points where each observation is 1) similar to other observations in the same cluster, and 2) dissimilar to observations in other clusters
What is it?• Cluster analysis is a statistical tool for discovering
hidden patterns in groups of observations - e.g., on what criteria are these “clusters” made?
• Cluster analysis is still quite subjective in nature. Does it make sense?
In Marketing…• Clustering is used to discover
distinct groups in customer bases (e.g., segments), and use this knowledge to develop targeted marketing programs
• Another example: Insurance companies use clustering to determine “what type” of drivers are risky, and safe - and charge premiums accordingly!
Good Clusters have:• High: Intra-class similarity
(observations in the cluster share qualities)
• Low: Inter-class similarity (distinct clusters are different from one-another)
Consider- two important characteristics
Student grades work hours
a 3.5 0
b 3.7 5
c 2.9 10
d 2.0 12
e 3.0 15
f 2.8 14
work hours
grades
a
dc
b
efcluster 1
cluster 2
How do we use this information?
We have 2 distinct segments.
Other data we have: age, gender,
hometown, grade level, major, hair color.
What is the segment profile of each?
work hours
grades
a
dc
b
efcluster 1
cluster 2
Are these both viable
targets?
That depends on ….?
Are all of these characteristics useful?
How do we use this information?
We have 2 distinct segments.
Descriptive Statisticsage gender hometown major haircolor
segment 1 - works a
lotMean = 20 57% male 90% NKY 65%
Business50%
blonde
segment 2 - good
studentsMean = 20 75% male 66% OH,
IN 50% Arts 75% brown
How to do it!
• You need access to SPSS. You can either 1) log in to NKU’s virtual network (VPN) using the virtual desktop, or you can use a University computer. (I suggest VPN)
• Use this link to learn how to use the virtual desktop. You first have to install the VPN software if you want to do it off-campus: click here.
Steps to follow…• Open your data set and save it
to a portable drive or your NKU “j” drive
• We will be using “Two-Step” cluster analysis.
• From SPSS file:
Analyze —->Classify —> two-step cluster
The Youtube tutorial is linked here if you need to review it.
Then follow the instructions on the YouTube tutorial.
Warnings
Don’t use binary variables in the clustering process (e.g., gender, team (yes/no)). These are “swamping variables” and will hijack your clusters.
Clusters of 3-4 are ideal, even if you have to force it and the criteria are not very good. You only have what you have…
Your data set might not ever give you “perfect” results based on the criteria discussed in the video tutorial. Thats ok. Do the best you can.
More on Profiles to come…