business data analytics · business data analytics lecture 3 customer segmentation mtat.03.319 the...
TRANSCRIPT
![Page 1: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/1.jpg)
Business Data Analytics
Lecture 3Customer Segmentation
MTAT.03.319
The slides are available under creative common license.
The original owner of these slides is the University of Tartu.
![Page 2: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/2.jpg)
Business Data Analytics
Chapter 3
Repeatable, Decision, Mechanism (Approach), Objective, Segmentation, Classification and Prediction
Data Exploration
Customer (Data) Segmentation
Chapter 2
Chapter 1
Descriptive (numbers) & Visualization
![Page 3: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/3.jpg)
Marketing and Sales
Customer Segmentation
![Page 4: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/4.jpg)
OutlineTechniques for Customer Segmentation
• Intuition Based
• Historical/Behavior basedRFMValue TierLife Cyclestage
• Data DrivenK-MeansHierarchical Clustering
![Page 5: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/5.jpg)
![Page 6: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/6.jpg)
Know your clients
Persona analysis
![Page 7: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/7.jpg)
Persona analysis
identification of customer types based on their preferences
better communication loyalty new customers
![Page 8: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/8.jpg)
The most attractive customers are
usually those
for which there is a big gap between
their needs
and the current satisfaction of these
needs
![Page 9: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/9.jpg)
How’d you do it?
Imagine you have become a business owner of a big company
in the automobile sector and wish to redefine the focus of your production line.
You want to know what and how much to produce (SUV, 4x4, coupe, sedan, wagon, etc..)
You have a big client base.
![Page 10: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/10.jpg)
Customer segmentation
Intuition-based Historical/behavioral-based Data-driven
![Page 11: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/11.jpg)
Intuition-based
Demographic Attitudinal
Working men in 30s from
Tartu with children
grouping customers
based on their demographic
characteristics
grouping customers
based on their needs
Women who wish to increase
sport activity
and need motivation
![Page 12: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/12.jpg)
Customer segmentation
Intuition-based Historical/behavioral-based Data-driven
![Page 13: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/13.jpg)
Behavioral-basedgrouping customers based of what they done in the
past: purchases,
website browsing, comments
![Page 14: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/14.jpg)
Behavioral-basedgrouping customers based of what they done in the
past: purchases,
website browsing, comments
RFM Value tier Lifecycle stage
![Page 15: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/15.jpg)
RFM
![Page 16: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/16.jpg)
RFM
![Page 17: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/17.jpg)
F(h)R(l)M(h)
h - high
n - neutral
l - low
F(h)R(n)M(h)
F(h)R(h)M(h)
F(h)R(n)M(n)
F(n)R(n)M(n)
F(n)R(h)M(n)
F(n)R(l)M(n)
F(n)R(l)M(l)
F(l)R(l)M(l)
F(n)R(h)M(l) F(l)R(h)M(n)
RFM
![Page 18: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/18.jpg)
Value tier
Grouping customers based on the value they deliver to your
business. Top 1%, top 5 % etc of generated revenue.
![Page 19: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/19.jpg)
Lifecycle stage
Grouping customers based on the type of relationships with
the company/brand
new customer regular loyal returning
![Page 20: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/20.jpg)
Customer segmentation
Intuition-based Historical/behavioral-based Data-driven
RFM Value tier Lifecycle stage
![Page 21: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/21.jpg)
Data-driven segmentation
Automated discovery of the new segments
Depends on your data
Ursus Wehrli
![Page 22: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/22.jpg)
Data-driven segmentation
Automated discovery of the new segments
Depends on your data
Ursus Wehrli
Price sensitivityPromotion sensitivity
Product affinityemail responses
![Page 23: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/23.jpg)
Automated segmentation
![Page 24: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/24.jpg)
Automated segmentationDistances inside
clusters are small:
dense cloud
![Page 25: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/25.jpg)
Automated segmentation
Distances between
clusters are maximized
![Page 26: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/26.jpg)
…in multidimensional space
![Page 27: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/27.jpg)
Clustering in U.S Army
![Page 28: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/28.jpg)
![Page 29: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/29.jpg)
3 Important Questions
1. How do you represent a cluster of more than
one point ?
2. How do you determine the “nearness” of
clusters?
3. When to stop combining clusters
Source: Stanford University – Hierarchical clustering
![Page 30: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/30.jpg)
3 Important Questions
1. How do you represent a cluster of more than one
point?
centroid
2. How do you determine the “nearness” of clusters?
Some Distance metric
3. When to stop combining clusters
convergence
![Page 31: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/31.jpg)
K-means clustering
![Page 32: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/32.jpg)
K-meansfixed number of clusters -
you need to choose it
yourself
![Page 33: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/33.jpg)
K-meansfixed number of clusters -
you need to choose it
yourself
based on the calculation
of averages
![Page 34: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/34.jpg)
Initialization
Pick randomly K
nodes from the all the
nodes as initiators
32
1
K2
K3
K1
![Page 35: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/35.jpg)
Assignment step:
distance calculation
d
d
d d
d(a, b) =
321
K2
K3
K1
4
X
Y
(𝑥4 − 𝑥1)2+(𝑦4 − 𝑦1)
2=d(4, 1)
𝑖=1
𝑛
(𝑖𝑎 − 𝑖𝑏)2
![Page 36: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/36.jpg)
Assignment step:
distance calculation
=
K2
K3
K14
4 goes to the basket
with min distance
321
4d(a, b)
𝑖=1
𝑛
(𝑖𝑎 − 𝑖𝑏)2
![Page 37: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/37.jpg)
Assignment step:
distance calculation
d
d
d
d
=
321
K2
K3
K1
5
5
4
d(a, b)
𝑖=1
𝑛
(𝑖𝑎 − 𝑖𝑏)2
![Page 38: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/38.jpg)
Assignment step:
distance calculation
d
dd
=
3
21
6
K2
K3
K14
5
6
d(a, b)
𝑖=1
𝑛
(𝑖𝑎 − 𝑖𝑏)2
![Page 39: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/39.jpg)
Assignment step:
distance calculation
d
dd
=
3
21
7 K2
K3
K14
5
6
, 7
d(a, b)
𝑖=1
𝑛
(𝑖𝑎 − 𝑖𝑏)2
![Page 40: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/40.jpg)
Assignment step:
distance calculation
Make three K clusters
(once you finish all the
data points)
![Page 41: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/41.jpg)
Update step:
calculation of average
7
Calculate centroid
of 3 clusters
1
𝑛
𝑖=1
𝑖=𝑛
𝑦𝑖C = 1
𝑛
𝑖=1
𝑖=𝑛
𝑥𝑖 ,
![Page 42: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/42.jpg)
Update step:
assignment recalc
K2
K3
K1
7
![Page 43: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/43.jpg)
Update step:
assignment recalc
d
K2
K3
K1
7
7
Datapoint 7 moved from K2 to K1
![Page 44: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/44.jpg)
Exercise
You have a dataset with two dimensions, customer loyalty score and
customer value score, and the following dataset for 6 customers:
loyalty value
1 2.3 8.0
2 5.6 6.7
3 4.2 7.1
4 3.1 6.0
5 4.6 11.0
6 6.0 9.0
Find 3 clusters using k-means
![Page 45: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/45.jpg)
kmeans(dt, 3)
R code:
Your result depends
on K
![Page 46: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/46.jpg)
DrawbacksThe number of clusters k is an input parameter:
an inappropriate choice of k may yield poor results.
How to select initial data points ?
Pick the first seed randomly
Pick the second seed as far as possible from the first seed.
Pick the third seed as far as possible from the first two
:
:
![Page 47: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/47.jpg)
Best value of K ?
Drawbacks
![Page 48: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/48.jpg)
Drawbacks
Best value of K ?
![Page 49: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/49.jpg)
DrawbacksConvergence to a local minimum may produce
counterintuitive ("wrong") results
Sensitive to noise and outliers
![Page 50: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/50.jpg)
Drawbacks
Can handle only numerical features
Male Female
Do you know
what is our
Euclidean
distance?
![Page 51: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/51.jpg)
Cannot detect clusters of various shapes
Drawbacks
![Page 52: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/52.jpg)
Hierarchical clustering
![Page 53: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/53.jpg)
Why?No need to choose number of clusters and worry about
the initialization. Creates a tree, where lower levels are
subclusters of higher levels
![Page 54: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/54.jpg)
Why?Any distance metric can be used (not only Euclidean)
![Page 55: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/55.jpg)
Why?
Easy to visualize, provides
a good summary of the
data structure in terms of
clusters
The plot of hierarchical
clustering is called
dendrogram
![Page 56: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/56.jpg)
Hierarchical clustering
• Agglomerative (bottom up)
Initially each point is a cluster
Repeatedly combine the two “nearest” clusters into one
• Divisive (Top Down)
Start with one cluster and recursively split it
![Page 57: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/57.jpg)
Hierarchical Clustering Agglomerative (bottom up)
(0,0)
(1,2)
(2,1) (4,1)
(5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid
We have 6 clusters. Each data point is a
cluster.
Step 1: Calculate the distance (Euclidean)
among very pair of points
Dis
tance
![Page 58: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/58.jpg)
(0,0)
(1,2)
(2,1) (4,1)
(5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid
Step 2: Pick two points having shortest
distance.
This is first cluster. To represent this cluster,
calculate the centroid (average of two data
points), that is ((1+2)/2, (2+1)/2)
(1.5,1.5)X1
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 59: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/59.jpg)
(0,0)
(4,1)
(5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X CentroidNow we have 5 clusters.
Calculate again the distance among all the data points, but
now cluster of (1,2) and (2,1) is represented by X1(1.5, 1.5)
Pick with min distance.
(1.5,1.5)X1
X2 (1.5,1.5) Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 60: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/60.jpg)
(0,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X CentroidNow we have 4 clusters.
Distance between (0,0) and centroid X1 is the minimum
Pay Attention: How to calculate the new centroid ?
(1.5,1.5)X1
X2 (4.5,0.5) Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 61: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/61.jpg)
(0,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid New centroid: is calculated using original
points (and NOT using centroid).
X2 (4.5,0.5)
X3 (1,1)
(1,2)
(2,1)
(1.5,1.5)X1
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 62: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/62.jpg)
(0,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X CentroidNow 3 clusters
Cluster 1: (5,3)
Cluster 2: [(1,2) , (2,1), (0,0)]
Cluster 3: [ (4,1), (5,0)]
X2 (4.5,0.5)
X3 (1,1)
(1,2)
(2,1)
(1.5,1.5)X1
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 63: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/63.jpg)
(0,0) (5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid
X2 (4.5,0.5)
X3 (1,1)
(1,2)
(2,1)
X1(1.5,1.5)
(4,1)
Now 3 clusters
Cluster 1: (5,3)
Cluster 2: [(1,2) , (2,1), (0,0)]
Cluster 3: [ (4,1), (5,0)]
NOTE: To calculate distance: Always use recent
centroids or data points, whichever applicable
Calculate Distance between
(5,3) and X2
X2 and X3
X3 and (5,3)
Dis
tance
![Page 64: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/64.jpg)
(0,0) (5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid Now we have two clusters
X2 (4.5,0.5)
X3 (1,1)
(1,2)
(2,1)
(1.5,1.5)X1
(4,1)
(5,0)
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 65: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/65.jpg)
(0,0) (5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X CentroidMight not make any sense to make the last
hierarchical cluster (everything into one) ?
X2 (4.5,0.5)
X3 (1,1)
(1,2)
(2,1)
(1.5,1.5)X1
(4,1)
(5,0)
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 66: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/66.jpg)
(0,0) (5,0)
(5,3)
Source: Stanford – Hierarchical clustering
Dendrogram
Datapoint
X Centroid Select a threshold to select the clusters.
(1,2)
(2,1) (4,1)
(5,0)
Dis
tance
Hierarchical Clustering Agglomerative (bottom up)
![Page 67: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/67.jpg)
Single Linkage
Complete Linkage
Average Linkage
Centroid Linkage
Produces chains
Pick clusters with minimum distance but
how to calculate distance ?
Distance between closest
elements in clusters
Distance between furthest
elements in clusters
Forces “spherical”
clusters with consistent
diameter
Distance between centroids
(means) of two clusters
Less affected by outliersAverage of all pairwise
distances
Ward’s methodConsider joining two
clusters, how does it
changes the total distance
(variance) from centroids
Variance is the key
Source: http://homepages.inf.ed.ac.uk/vlavrenk/
![Page 68: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/68.jpg)
Non Euclidean space
How to represent a cluster of many points ?
Clustroid: data (point) closest to other points
How do you determine the “nearness” of clusters ?
Treat clustroid as if it were centroid when computing intercluster distances
Possible meaning of closest
Smallest maximum distance to other points
Smallest average distance to other points
Smallest sum of squares of distance to other points
![Page 69: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/69.jpg)
Adv: Clustering products by customer preferences
![Page 70: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/70.jpg)
Drawbacks
Sensitivity to outliers
A hierarchical structure is created even when
such structure is not appropriate.
Sometimes the dendrogram is too huge huge to
infer anything meaningful
![Page 71: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/71.jpg)
Other clustering methods
source:wikipedia
![Page 72: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/72.jpg)
Additional examples:http://inseaddataanalytics.github.io/INSEADAnalytics/BoatsSegmentationCaseSlides.pdf
![Page 73: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/73.jpg)
![Page 74: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/74.jpg)
Summary
Techniques for Customer Segmentation
• Historical/Behavior based
RFM
• Data Driven
K-Means: How to Pick K ? Elbow Method
Hierarchical Clustering: Different Distance metrics
![Page 75: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/75.jpg)
Demo time!
https://courses.cs.ut.ee/2019/bda/spring/Main/Practice
![Page 76: Business Data Analytics · Business Data Analytics Lecture 3 Customer Segmentation MTAT.03.319 The slides are available under creative common license. The original owner of these](https://reader030.vdocument.in/reader030/viewer/2022040807/5e49ad65458c4d34e13f7f07/html5/thumbnails/76.jpg)
Books and links
http://labs.openviewpartners.com/files/2012/10/Cu
stomer-Segmentation-eBook-FINAL.pdf
https://www.putler.com/rfm-analysis/About RFM:
http://proquestcombo.safaribooksonline.com.ezproxy.utlib.ut.e
e/book/databases/business-intelligence/9780470650936
About segmentation in B2B setting: