identifying customer segments through k-means cluster

32
TEAM 7 Hana Keiningham Madhuri Pawar Qianhe Zhao Bingchen Wang Identifying Customer Segments Through K-Means Cluster Analysis

Upload: others

Post on 03-Oct-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Customer Segments Through K-Means Cluster

TEAM 7 Hana Keiningham

Madhuri Pawar

Qianhe Zhao

Bingchen Wang

Identifying Customer Segments

Through K-Means Cluster Analysis

Page 2: Identifying Customer Segments Through K-Means Cluster

1

Table of Contents Executive Summary .......................................................................................................................2

1. Introduction ........................................................................................................................3

2. Background ........................................................................................................................5

2.1 Market Structure ...............................................................................................................5

2.2 Industry Trends ................................................................................................................5

2.3 Products & Services .........................................................................................................6

2.4 SWOT analysis .................................................................................................................6

2.5 Google Trend analysis ......................................................................................................7

3. Methodology .......................................................................................................................7

3.1 Overview ........................................................................................................................8

3.2 Data .................................................................................................................................9

3.3 Variables .........................................................................................................................9

3.4 K-Means Cluster Analysis ............................................................................................10

4. Post-Hoc Analysis ............................................................................................................14

4.1 Latest Purchase Analysis ...............................................................................................14

4.2 Purchase Frequency and Volume Analysis ...................................................................15

4.3 Payment Method Analysis .............................................................................................16

4.4 Channel Analysis ...........................................................................................................16

4.5 Geographic Analysis .....................................................................................................17

5. Discussion and Recommendation ....................................................................................18

5.1 Overview .......................................................................................................................18

5.2 Recommendations .........................................................................................................19

6. Limitation and Future Research .....................................................................................21

References .....................................................................................................................................22

Appendixes....................................................................................................................................24

Page 3: Identifying Customer Segments Through K-Means Cluster

2

Executive Summary

Once a popular destination for American customers to shop for casual clothing, Gap

brand is experiencing a continuous decrease in sales in recent years. With the emerging fast-

fashion brands eating up Gap’s market in the appeal industry, Gap is also perceived as losing its

identity (Monllos, 2015). In this research, we conducted a K-Means Cluster Analysis to identify

meaningful customer segments that shop at Gap to provide more insights in making effective

marketing decisions to retain them and increase their purchases. The analysis was conducted on a

transactional dataset that includes 100,000 samples with 226,129 cases. We applied RFM

variables in performing the analysis and finally divided the customer samples into seven

segments, which covers all the meaningful segments.

In the analysis, we identified the “Champion” customer and the “Potential Loyalist”

customer segments which generate the highest revenues and profits for Gap. These segments

include valuable customers that Gap should invest more marketing resources and efforts on. We

have also identified a “New Customer” segment that represents the most recent customers who

shopped at Gap. Gap also needs to spend time and energy on this segment to develop these

customers into loyal customers. In terms of payment method analysis, Visa, MasterCard, and

American Express are the top three payment methods used by Gap customers, so Gap should

continue its cooperation with these credit cards companies. Customers across all segments tend

to make both online and in-store purchases; however, the most profitable one’s prefer to shop

more in stores.

Page 4: Identifying Customer Segments Through K-Means Cluster

3

1. Introduction

Gap Inc. is a well-known international specialty retailer, offering clothing, accessories

etc. for all age groups under the Gap, Banana Republic and Old Navy brand names. Gap Inc, has

seen its share of the U.S. apparel market drop from 5.1% to 4.7% over the past five years, which

clearly indicates the diverse nature of the U.S. apparel industry. Higher shares are held by multi-

brand retail chain such as Macy’s (9%) (Forbes, 2015). The remaining share belongs to

numerous specialty and private label brands, fast-fashion players, department stores and pure-

play online retailers such as Amazon (Forbes, 2015). The share is illustrated in the graph below:

Figure 1 U.S. Apparel Market Share (Forbes, 2015)

Fast fashion retailers, which are the key competitors, Zara and H&M lead the list with sales

of $19.7 Billion and $20.2 Billion respectively, Uniqlo $16.6 Billion and Gap $16.4 Billion

globally (Fast Fashion Retailing, annual report 2016). In the US market, the sales figure in Gap

Inc. is $11,989 million. The sales drop in the last three years is depicted in the table below:

Table 1 Gap Annual Sales in 2014-2016 (Gap Annual Report, 2016)

Gap Annual Sales in

the US ($ in Millions)

Overall Gap Old Navy Banana Republic

2016 $11,989 $3,113 $6,051 $2,052

2015 $12,213 $3,303 $5,987 $2,211

2014 $12,672 $3,575 $5,967 $2,405

Page 5: Identifying Customer Segments Through K-Means Cluster

4

Additionally, Gap has been losing its footing over the past few years with growing

competition. It started losing its touch after the recession when U.S. buyers gradually moved to

fast-fashion players in search of relatively fashion-forward merchandise. The biggest problem for

the brand is: 1) Its lack of clear, unique identity because of its casual and basic offering at relatively

high prices, and 2) The rise of fast-fashion competitors like H&M, Zara, and Forever 21. Millennial

shoppers are not as brand-loyal as they have been in previous cycles, and they are more fashion-

forward and are at the continuous hunt for inexpensive options (Forbes, 2015).

Hence, the reason behind Gap Inc.’s falling market share is:

• The growing buyer affinity towards fast-fashion brands for the search of trendy fashion;

• Move towards departmental stores, as customers are discount hunters for top brands;

• The ongoing online shift, which drives the growth of online retail stores of branded

companies and companies like Amazon.

Study Objective

However, Gap Inc remains among the most preferred destination for casual apparel in the

market, where buyers continue to elude American Eagle, Abercrombie, and Aeropostale. To

address this issue and arrest the decline in market share, the company needs to focus on finding

the best marketing strategies towards different categories of customers that can help attract and

retain them. Therefore, the objective of the study is to explore the different customer segments

that shop at Gap. We do this to understand the characteristics of the key segments of customers

that shop at Gap, by performing K-Means Cluster Analysis on RFM variables, such as recency

and frequency of purchase, the channel used, profit, revenue generated by each segment etc.

Identifying meaningful customer segments will provide insights for the company to make better

business decisions to regain the market share and increase the sales.

Page 6: Identifying Customer Segments Through K-Means Cluster

5

2. Background

The apparel industry in the U.S. has grown slowly in the past year and Gap has witnessed

a drop in sales especially in the past three years. The following section provides an overview of

the industry and Gap’s background by introducing the market structure, current trends in the

apparel industry, the SWOT analysis and the Google Trend Analysis.

2.1 Market Structure

Table 2 below shows how apparel industry in the US is divided into five different sectors

(Stone, 2015). The apparel industry is mainly driven by the fast-fashion brands, multi-brand

chains & off- price retailers, because consumers are more fashion-forward and look for

inexpensive options.

Table 2 Five Divisions of Apparel Industries

Sectors Examples

Luxury Brands Saks Fifth Avenue, Ralph Lauren, Calvin Klein, Anne Klein

Middle-Market

(private labels)

Gap, Guess, Fast fashion like Zara, H&M, Uniqlo

* Multi-brand chains which sell private labels such as Macy's, Century 21

Downscale J. C. Penney, Kohl's, Sears

Discount Stores Kmart, Meijer, Target, Walmart

Off-Price

Retailers

Burlington Coat Factory, Marshalls, Ross Dress for Less, and T.J. Maxx are

stores that sell designer goods at lower prices, often on a surplus basis.

2.2 The Apparel Industry Trends

Retail sales in the US went up by 5.1% year-on-year to US$1.4 trillion in the first quarter of

2017. The apparel industry in the US experienced slow growth in 2016; the sales of women’s,

men’s and children’s apparel grew by three percent in the US in 2016, to reach $218.7 billion

(Mergent Industry Report, 2017). The apparel retail landscape is highly developed and

competitive with numerous designer brands, fast fashion brands, department stores and multi-

Page 7: Identifying Customer Segments Through K-Means Cluster

6

brand chains that are competing against each other on design, variety and price. Over the last few

years, affordable fast fashion chains such as Zara, H&M and Forever 21 have taken considerable

market shares from other fashion retailers (Euromonitor International, 2017). Additionally, the

apparel industry in the U.S. is also seeing a major shift from in-store purchases to digital

purchases. This shift towards digital commerce is mainly supported by the increase in digitally

savvy consumers (Euromonitor International, 2017).

2.3 Products and Services

Gap offers “optimistic,” casual, and American style apparel and accessories to its

customers (Gap, 2017). Gap’s products are available online through the company-owned official

website, and offline from the company-operated franchise stores or third-party retailers. For

customers in the U.S., Gap also offers them with omnichannel services, such as order-in-store,

reserve-in-store, and ship-from-store, which connect the brand’s digital stores and physical stores

(Reuters, 2017).

2.4 SWOT Analysis

Table 3 shows the SWOT analysis result of Gap brand in the U.S. market:

Table 3 Gap Brand SWOT Analysis

Strength Weakness

1. Strong product portfolio and brand

recognition 2. Presence of timeless iconic products 3. Strategic supplier relationship

1. Failure to utilize online sales channels

efficiently 2. Declining sales and profits 3. Heavily rely on vendors to sell product

Opportunity Threat

1. Increasing efficiency of online sales 2. Celebrity endorsement 3. Growing use of technology

1. Rapid change in fashion 2. Increased production and operational costs 3. Strong competition in apparel segment

Page 8: Identifying Customer Segments Through K-Means Cluster

7

2.5 Google Trends

Figure 2 “Gap” Google Trends Chart

The numbers on this chart show the popularity for Gap. This chart shows Gap Inc’s performance

from January 2004 to November 2017. “100” is the peak popularity to have ever happened to the

company. According to the Google Trends chart, December 2005 and December 2006 was Gap’s peak

point. After December 2006, Gap Inc, even at its seasonal high, is still significantly declining. The highest

point Gap Inc has been for the past year was in November 2016, where it hit 63 out of 100, which is still

37% lower than its highest ever peak performance. As you can tell, Gap Inc has been dropping in sales,

but still, Gap Inc’s highest months of performance seem to still be the same each year. Gap Inc has annual

peaks seasonally, around October, November or December due to the holiday season. January and

February of each year tend to be Gap’s lowest performing months. Gap Inc hit its lowest point ever this

past January (2017).

3. Methodology

3.1 Overview

In this research, the K-Means Cluster Analysis is the major method used to identify the

meaningful customer segments for Gap. We first aggregated all the transactional data into a

customer-level dataset for use in our analysis. To prepare the seeding data for the K-Means

Page 9: Identifying Customer Segments Through K-Means Cluster

8

Cluster Analysis, multiple Hierarchical Cluster Analyses (HCA) were conducted on four

randomly selected subsets, each including 10% of the entire data. For each subset, we conducted

both the Furthest Neighbor Method, and Ward’s Method HCA to find the number of clusters and

the cluster centers. We then used the seeding data to perform K-Means Analysis on the entire

sample to classify each customer data within one segment. The results of extensive analyses

suggest that the most appropriate customer segmentation, covering all meaningful customer

segments, includes seven segments. The complete analysis design is presented in Figure 3.

Figure 3 Overview of the Methodology

Page 10: Identifying Customer Segments Through K-Means Cluster

9

3.2 Data

This research includes the transactional data of 100,000 random Gap customers in the

U.S. The original dataset presents all orders, revenues, items, order lines, and returns for those

customers in transactions for all channels (stores, web and mobile app). The total 226,129

transactions cover the time from December 16, 2009 to September 17, 2017. The original dataset

was organized by Customer Identification Number with each item purchased listed within each

customer ID. To have a clear understanding of the customer transactions, we aggregated the

original transactional dataset into a new customer-level dataset, where each customer ID has only

one row of its total transactional information.

3.3 Variables

In order to identify the meaningful customer segments for marketing purposes, we chose

to use the RFM model in the analysis. After aggregating the data, we added recency, frequency

and monetary value variables to the dataset. We calculated and chose variables that reflect each

customer’s recency of last purchase, frequency of purchase, and monetary value. Based on the

twenty-three variables that are given, we chose the five variables that are most related to the

RFM model, which are listed below:

Table 4. Five Chosen Variables

Variables RFM Explanation

Revenue Monetary The total revenue that the customer generated

Profit Monetary The total profit that the customer brought

Months Recency The number of months that have elapsed since the

customer’s last purchase

Orders Frequency The number of the orders the customer made

Quantity Frequency The number of the items the customer purchased

We standardized the above five input variables as z-scores to minimize the influence of

different scales on the result.

Page 11: Identifying Customer Segments Through K-Means Cluster

10

3.4 Analysis

3.4.1 Step 1: Hierarchical Cluster Analysis

We ran multiple Hierarchical Cluster Analyses to identify the initial seeds for the K-

Means Cluster Analysis. Since the HCA requires extensive computational power, we chose to

run it on a randomly selected sample of 10% of the actual data. In total, we created four random

subsets, each containing 10% of the entire data, to run the HCA, referred as Subset 1, Subset 2,

Subset 3, and Subset 4.

To see which method would provide the most appropriate result, we used both Ward’s

Method and Furthest Neighbor Method for each subset, using distance measure of Squared

Euclidean. After running HCA on Subset 1, we identified that a 6-cluster solution is more

appropriate for the Ward’s Method (Table 5), while a 5-cluster solution is more appropriate for

the Furthest Neighbor Method (Table 6).

Table 5 The 6-Cluster Solution in Ward’s Method (Subset 1)

Subset 1 Ward’s Method

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean ZREVENUE 1.65744 0.19961 -0.13703 -0.28055 6.91241 38.72547

ZPROFIT 1.74353 0.22499 -0.14755 -0.28159 6.99852 32.35639

ZMONTHS -0.56402 -0.81831 0.86331 -0.86787 -0.82083 0.04509

ZORDERS 2.31266 0.38460 -0.20031 -0.36093 7.45791 11.64167

ZQUANTITY 2.03333 0.31712 -0.17605 -0.34999 8.07491 9.80997

Table 6 The 5-Cluster Solution in Furthest Method (Subset 1)

Subset 1 Furthest Method

1 2 3 4 5

Mean Mean Mean Mean Mean

ZREVENUE -0.01267 10.67675 12.21464 41.99720 35.45373

ZPROFIT -0.00956 10.32092 12.85875 32.41938 32.29339

ZMONTHS 0.01226 -0.89887 -1.04348 -1.50140 1.59158

ZORDERS -0.00117 20.28354 5.78440 23.64426 -0.36093

ZQUANTITY -0.01350 13.67901 16.57238 14.01545 5.60449

Page 12: Identifying Customer Segments Through K-Means Cluster

11

We conducted the same analysis on the other three subsets and got the other three pairs of

HCA tables as Subset 1 (see Appendix A). After running all the HCA, we got eight segmentation

tables in total from four randomly selected data subsets.

3.4.2 Step 2: K-Means Cluster Analysis

Based on Table 5 and Table 6, we created two new seeding data files, including the

number of clusters and the initial cluster centers for the following K-Means Cluster Analyses.

We then performed the K-Means Analyses on the entire sample of over 200,000 cases using the

two seeding data files from Subset 1, and generated two K-Means Tables (See Table 7 and Table

8). K-Means Analyses were conducted on 100% of the data using initial seeds from the other

three subsets as well, so that we generated the other six K-Means tables (See Appendix B).

Table 7 The 6-Cluster K-Means Solution (Initial Seeds from Subset 1)

K-Means Cluster (Subset 1: Ward's)

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38

PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65

MONTHS 25 31 62 19 21 37

ORDERS 5 3 1 1 10 4

QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00

COUNT 2277 11250 45947 40218 203 2

Table 8 The 5-Cluster K-Means Solution (Initial Seeds from Subset 1)

K-Means Cluster (Subset 1: Furthest)

1 2 3 4 5

Mean Mean Mean Mean Mean

REVENUE 129.79 626.74 136.07 38622.38 2303.81

PROFIT 68.35 337.02 74.92 20278.65 1189.97

MONTHS 62 29 19 37 22

ORDERS 1 3 1 4 7

QUANTITY 1.71 7.26 1.76 165.00 21.67

COUNT 47513 8360 43313 2 709

Page 13: Identifying Customer Segments Through K-Means Cluster

12

In the above tables, the 6-cluster solution (Table 7) appears to be more meaningful to us

that the other (Table 8) since it identifies a different segment of customers (Segment 6) with high

revenue (38622), high profit (20278) and large quantity of items purchased (165). This customer

segment has the highest profitability, which distinguishes segment 6 from all the other segments,

so Gap should include it in the marketing plan. Thus, through the comparison of the two K-

Means solutions with initial seeds from Subset 1, we chose the 6-cluster solution (Solution 1).

We then did the same comparison within each pair of K-Means tables using the initial

seeds from the same data subset and got the final four choices (See Appendix C). We chose the

5-cluster solution using Ward’s Method for the second subset pair (Solution 2), the 7-cluster

solution using Furthest Method for the third subset pair (Solution 3) and the 7-cluster solution

using Ward's Method for the fourth subset pair (Solution 4).

3.4.3 Step 3: Chose the Final Segmentation

Among the final four K-Means solutions, we first compared Solution 1(Table 9) with

Solution 2 (Table 10).

Table 9 Solution 1(Ward’s Method)

Solution 1

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38 PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65

MONTHS 25 31 62 19 21 37

ORDERS 5 3 1 1 10 4 QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00 COUNT 2277 11250 45947 40218 203 2

Page 14: Identifying Customer Segments Through K-Means Cluster

13

Table 10 Solution 2 (Ward’s Method) Solution 2

1 2 3 4 5

Mean Mean Mean Mean Mean

REVENUE 535.99 126.17 129.05 1784.40 15364.78 PROFIT 288.89 66.44 71.09 931.61 7752.80

MONTHS 29 62 19 24 20

ORDERS 3 1 1 6 18 QUANTITY 6.33 1.67 1.68 17.54 89.45

COUNT 9907 46872 41866 1241 11

We found that Solution 1 performs better in identifying the meaningful customer

segments since it identifies a most profitable customer segment (Segment 6) with an average

profit of 20,279, while Solution 2 fails to do so. In addition, Solution 1 has a better division of

customers with high profits as seen in segments 1, 5, and 6. On the other hand, in Solution 2, the

highly profitable customers all gathered in Segment 4 (profit=931, count=1241) and Segment 5

(profit=7753, count=11). In sum, we think Solution 1 is a more appropriate segmentation.

We compared Solution 3 and Solution 4 through the same procedure. The two 7-cluster

solutions have similar segmentation results. However, we chose Solution 3 using the Furthest

Neighbor Method initial seeds, since we believe the Furthest Neighbor Method can better tell the

differences among the customer segments.

Finally, we compared Solution 1(reference Table 9) with Solution 3 (Table 11).

Table 11 Solution 3 (Furthest Method)

Solution 3

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38

PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65

MONTHS 62 32 19 26 22 24 37 ORDERS 1 2 1 4 7 12 4

QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00

COUNT 45246 11835 39091 3119 562 42 2

Page 15: Identifying Customer Segments Through K-Means Cluster

14

The tables show that segments 1, 2, 3, 4, 6 from Solution 1 (Table 9) are accordingly

similar to segments 4, 2, 1, 3, 7 from the Solution 3 (Table 11). In addition to the aforementioned

segments, Solution 3 identifies more highly profitable customers in segment 5 (profit=1103,

month=22, count=562,) and segment 6 (profit=3406, month=24, count=42), a total of 604

counts. Yet in Solution 1, there is only 203 counts in segment 5 (profit=1880, month=21). The

customer segments with high profit and more recency are more valuable in Gap’s marketing

plan. Thus, based on the chosen RFM variables, we finally divided the customers into seven

groups using the initial seeds of Furthest Method in the third subset.

4. Post-Hoc Analysis

Table 12 shows our final segmentation of Gap customers. We conducted five post-hoc

analysis on this 7-cluster solution, including the Latest Purchase Analysis, Purchase Frequency

and Volume Analysis, Payment Method Analysis, Channel Analysis, and Geographic Analysis.

Table 12 Final Segmentation Final Solution

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38

PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65

MONTHS 62 32 19 26 22 24 37

ORDERS 1 2 1 4 7 12 4

QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00

COUNT 45246 11835 39091 3119 562 42 2

4.1 Latest Purchase Analysis (Recency)

Segment 1(month=62) and segment 3 (month=19) identify the least recent and most

recent Gap customer groups respectively (see Table 12). Nearly half of the customer samples fall

into segment 1 (count=45246), yet it has the least recency and generate less profit. The

Page 16: Identifying Customer Segments Through K-Means Cluster

15

customers in this segment have not shopped with Gap for nearly five years. On the contrary,

although segment 3 (count=39091) has a large count number and less profits, it identifies the

most recent shopper segment, the group of people who shopped at Gap within two years.

Segments 4 (month=26), 5 (months=22), and 6 (months=24) show the customers shopped at Gap

around two years ago, which are the second most recent shoppers. Those segments with high

recency will be the ones the company should put more marketing efforts on.

4.2 Purchase Frequency and Volume Analysis (Frequency)

Table 12 also shows the number of orders and the quantity of purchase that each segment

has made during the time period from December 2009- September 2017. The table above depicts

that segment 6, one of the top two most profitable customer segments, as the most frequent

purchasers (12 times) with 43 times in the time frame. Followed by segment 5 which is the third

most profitable segment (7 purchases) who bought 21 times in the time frame. In addition, the

most profitable segment (segment 7) has bought the most number of items (165 items), however

their frequency of purchase is less as compared to segment 5 and 6. Other segments (i.e. segment

1) ordered less (1 time and only 2 orders). Similarly, segment 2 (2 times and 5 orders), segment 3

(1 time and only 2 orders) and segment 4 (4 times and 10 orders).

4.3 Payment Method Analysis

Among the 14 payment methods used by GAP customers, we kept four major methods

(American Express, Discover, Mastercard and Visa) and recoded the other less frequently used

methods into a new category named “Others.” Based on the data, nearly half of GAP’s customers

use Visa to make purchases. Respectively, customers selected Mastercard, American Express

and Discover as their second, third and fourth choices for payment methods. The percentages of

each payment method remained relatively constant across each segment (see Table 13).

Page 17: Identifying Customer Segments Through K-Means Cluster

16

Table 13 Payment Method Used by Gap Customers

Payment Method Structure

1 2 3 4 5 6 7

Count % Count % Count % Count % Count % Count % # %

AX 7847 17% 6163 16% 2247 19% 618 19% 146 24% 18 41% 0 0%

DI 2267 5% 2050 5% 674 6% 183 6% 22 4% 2 5% 0 0%

MC 12586 28% 9156 24% 3049 25% 826 25% 147 24% 10 23% 0 0%

VI 21153 47% 17931 46% 5388 45% 1457 44% 261 43% 12 27% 1 50%

OTHERS 1219 3% 3518 9% 696 6% 221 7% 26 4% 2 5% 1 50%

(AX=American Express; DI=Discover; MC=MasterCard; VI=Visa; Others= Amazon Pay Method; Bill Me Later; Diner’s Club;

Money Order; Multi Credits; Open Account; Personal Check; Prepaid Exchange; PayPal; Invalid CC Number)

4.4 Channel Analysis

Table 14 demonstrated that each cluster has different preferences for purchasing

channels. Among all 7 clusters, “app channel” generates the least customer traffic and its

proportion of all purchases remains at a relatively low level. There is an inverse relationship

between the proportions of “store channel” and “website channel” across all clusters. The

percentage of store purchases increase from segment 1 to segment 7, but the percentage of

website purchases decrease from segment 1 to segment 7.

Table 14 Purchasing Channels of Gap Customers

CHANNEL

1 2 3 4 5 6 7

Count % # % # % # % # % # % # %

APP 1118 2% 1574 4% 584 5% 199 6% 26 4% 1 2% 0 0%

STORE 15834 35% 10818 28% 6496 54% 2227 67% 431 72% 34 77% 2 100%

WEBSITE 28120 62% 26426 68% 4974 41% 879 27% 145 24% 9 20% 0 0%

(# = count, % = percentage)

Page 18: Identifying Customer Segments Through K-Means Cluster

17

4.5 Geographic Analysis (Zip Codes)

We first wanted to look at the geographical location of most profitable consumers.

For Segment 6, we discovered that it ended up being scattered across the entire country (see

Figure 4). For Segment 7, our second most profitable segment, we saw that the two locations

where our customers were from Mt Vernon, Illinois and Conroe, Texas (See Appendix D).

Figure 4 Map of the Segment 6

From looking at the Zip Codes per each cluster, since there were so many for most of our

final segments, we decided to use Zip Code Demographic Data from the U.S. Census to help

analyze more in depth into these geographical locations, to be able to find the population per zip

code, the average income of the zip codes, the median income of the zip codes, the average age,

and the average female percentage of these locations Using the Match Files syntax command,

this U.S. census data was appended to the existing SPSS file. These zip codes were matched

using the “Tables” subcommand, just in case if the same zip code was used multiple times.

Through the analysis, we found that the mean income seems to be the most scattered (See

Appendix D). Since the household mean income according to the U.S. Census Bureau is 72,641,

all of these, but Segment 7 were above the mean household income. Segment 7 also happens to

be the most profitable segment, which could potentially mean that they are “aspirational buyers,”

Page 19: Identifying Customer Segments Through K-Means Cluster

18

but since they are only a segment of 2, the understanding of the group is also not clear. Through

our analysis, we were able to see that every segment, but Segment 6 were from an upper-middle

class area.

5. Discussions and Recommendations

5.1 Overview

Through cluster analysis, we divided the customers into 7 segments. Segment 1 generated

the lowest profit (profit=62.28), brought the least quantity of items (quantity=1.58) and had the

least recency (months=62). It will cost a great amount of resources to implement strategies for

customers in segment 1 so we named it the “Lost Customers”. In segment 2 each customer

generates about $216.55 in profits and buys 4.79 items, yet the recency of those customers is low

(month=32). We named segment 2 the “Hibernating Customers” since there is a large gap

between purchases. Segment 3, the “New Customers” has the highest recency among all

segments. Segment 4 represents the “Customers that Need Attention”, since although it has

relatively high profit and quantity, its recency is relatively low (months=26). Segments 5, 6, and

7 are our highly valuable customers that bring the most profits to Gap. Although segment 5 has a

high profit and quantity, they are not as high as segment 6 and 7, so we named it the “Potential

Loyalist”. The segments 6 and 7 generate the highest profits and bring the highest volumes of

purchased products. As a result, we group segment 6 and segment 7 together and named it the

“Champion Customers”.

Page 20: Identifying Customer Segments Through K-Means Cluster

19

5.2 Recommendations

5.2.1 Segment 3 (Revenue= 117, profit= 64, Month=19, Order=1, Quantity= 1 & count= 39091).

This is a meaningful segment to be focused upon as it represents the largest group of most

recent purchasers. Segment 3 is the “New Customer” segment. Customers who have purchased

from a company recently are more likely to buy from that company again than customers who

have not shopped at Gap for a while. This segment should be leveraged in order to retain them

for more purchases. We recommend the following actions to attract and retain this segment:

1) Take feedback from these customers on their recent purchase to avoid any post purchase

dissonance and understand their intent for future purchase.

2) Send customer special promotion codes through email or mails to encourage them to

shop more in-store or online.

3) Offer special discounts on their second and third purchases from Gap. Customers can

have a discount within 6 months of their next purchase after their first purchase.

4) Provide Gap memberships which customers can use to collect points at every purchase

and redeem the points for monetary discount after a certain value.

5.2.2 Segment 5 (Revenue=2114, Profit=1104, Month=22, Order=7, Quantity=21, count=562)

Segment 5, the “Potential Loyalist”, is also a meaningful segment that worth investing on.

This segment identifies a relatively more recent and high profitable shopper body. On average,

the customers in this segment has been shopped with Gap in the past two years, and have around

7 orders for 21 items. We recommend the following actions to increase their loyalty:

1) Provide “Silver Card” membership program for customers in this segment with special

privileges, such as have preview for sales, free standard shipping on all orders, and

birthday discount (extra 10% off) on the purchase, etc.

Page 21: Identifying Customer Segments Through K-Means Cluster

20

2) Create a Gap community especially for this segment, organize activities for these

customers, invite those customers to “Yoga with Gap”, “Gap Street Snap Competition”

“Gap Runway Show”.

5.2.3 Segment 6 & 7

The segment 6 & 7, named the “Champion Customers”, are the most valuable customers,

because they generate the most profit for Gap. However, they have relatively low recency

compared with other segments. Along with providing benefits similar to those given to previous

two segments, we recommend the following additional actions to encourage more purchases:

1) Provide a platinum membership to this group which they can upgrade to a gold card

depending on future purchases. The points collected in the cared can be used in exchange

of coupons and discounts. Additionally, all platinum members will be sent the latest look

book and have the advantage to buy products before seasonal new product launch.

2) Offer customization options for special use in certain stores. Customers can print their

own logo or slogan on clothes if they buy many products once.

5.2.4 Payment Method & Shopping Channel

In terms of payment method, Visa is the most popular payment method because nearly

50% of customers use to shop at Gap. Gap can take advantage of this trend by introducing more

promotional and discount campaigns for Gap visa card holders and potential credit card openers.

With regards to the channel choice, the percentage of app usage remains relatively low.

Due to it being so low, we would like to see more Gap app usage in the future. By combining an

in-store experience with the app, it will create a more lively experience for the consumer. For

example, the consumer could scan codes or take photos with app to get potential discount codes

or in-store perks.

Page 22: Identifying Customer Segments Through K-Means Cluster

21

6. Limitations and Future Research

We realize several limitations in this research and discover further research directions in

the analysis. For this research, we only used Gap’s internal data, including frequency, recency,

and monetary value, to segment Gap customers. In the future research we could include more

data from customers through social media platforms, syndicated data, and other external sources.

The future researchers could focus on psychological factors that affect customers’ purchasing

behavior. The combination of the RFM model and the psychological data analysis could provide

a more comprehensive understanding of customer needs. Through RFM model, researchers can

identify customers segments from the least profitable to the most profitable, while with further

consumer behavior researches, researchers could identify and accommodate more effective and

practical strategies. Additionally, there would be certain moderating factors such as store

location, influencing the RMF variables (recency and frequency of purchase), which Gap should

study more in detail by conducting a consumer survey, in order to strategize on those factors that

are of value.

Page 23: Identifying Customer Segments Through K-Means Cluster

22

REFERENCES

Dudovskiy. (2016, October 23). Gap Inc. SWOT Analysis: Declining Sales and Profits Despite

Strong Brand Portfolio. Retrieved November 15, 2017, from https://research-

methodology.net/gap-inc-swot-analysis/

Fast Retailing. (2016, August 31). Fast retailing annual report 2016. Retrieved from

https://www.fastretailing.com/eng/ir/library/pdf/ar2016_en.pdf

GAP. (2016). GAP annual report 2015. Retrieved from

http://www.gapinc.com/content/dam/gapincsite/documents/GPS%202015%20Annual%2

0Report.pdf

Google Trends. (2017). Explore search interest for Gap Inc. by time, location and popularity on

Google Trends. Retrieved November 15, 2017, from

https://trends.google.com/trends/explore?date=all&geo=US&q=%2Fm%2F01yfp7

Monllos, K. (2015, June 17). The Gap's Biggest Problem Is That It Lost Its Brand Identity.

Retrieved November 15, 2017, from http://www.adweek.com/brand-marketing/gaps-

biggest-problem-it-lost-its-brand-identity-165367/

Putler. (2017, August 10). RFM Analysis For Successful Customer Segmentation. Retrieved

November 15, 2017, from https://www.putler.com/rfm-analysis/

Reuters. (2017). Gap Inc (GPS) Company Profile. Retrieved November 15, 2017, from

http://www.reuters.com/finance/stocks/companyProfile/GPS

Page 24: Identifying Customer Segments Through K-Means Cluster

23

Sample Essay on SWOT Analysis of GAP Inc. (n.d.). Retrieved from

http://www.essaysexperts.net/blog/sample-essay-on-swot-analysis-of-gap-

inc/#sthash.V8aDy8DP.X7o2dhoh.dpbs

Trefis Team. (2015, July 15). Gap Inc Is Gradually Losing Its Share In The U.S. Apparel Market

To Fast-Fashion Counterparts. Retrieved November 15, 2017, from

https://www.forbes.com/sites/greatspeculations/2015/07/15/gap-inc-is-gradually-losing-

its-share-in-the-u-s-apparel-market-to-fast-fashion-counterparts/#7c12f7ddb0e1

The Gap, Inc. - Financial and Strategic Analysis Review. (n.d.). N.p.: Business Insights.

Retrieved from

http://bi.galegroup.com.avoserv2.library.fordham.edu/essentials/showpdf?pdfdocid=3037

47_GDRT29527FSA

Wahba, P. (n.d.). Gap brand's sales declines keep getting worse. Retrieved from

http://fortune.com/2015/05/11/gap-sales-declines-worse/

Page 25: Identifying Customer Segments Through K-Means Cluster

24

Appendices A: HCA Tables

Table 1 Subset 2- Ward’s Method

Subset 2 Ward Method

1 2 3 4 5

Mean Mean Mean Mean Mean

ZREVENUE 0.94205 -0.19971 -0.14097 4.42495 30.06372

ZPROFIT 0.97994 -0.21004 -0.13650 4.46652 26.43433

ZMONTHS -0.20332 0.79516 -0.92219 -0.46824 0.38652

ZORDERS 1.41663 -0.24457 -0.14724 3.61405 3.96001

ZQUANTITY 1.46958 -0.23597 -0.20537 4.40964 10.06230

Table 2 Subset 2- Furthest Neighbor Method

Subset 2 Furthest Method

1 2 3 4 5

Mean Mean Mean Mean Mean

ZREVENUE -0.04674 5.89913 7.91898 29.30347 32.34450

ZPROFIT -0.04559 6.08823 7.11438 25.07690 30.50664

ZMONTHS -0.01019 -0.74564 -0.03391 1.02922 -1.54157

ZORDERS -0.03848 7.81862 1.17540 0.91935 13.08198

ZQUANTITY -0.04419 7.78511 1.99338 4.70732 26.12723

Table 3 Subset 3- Ward’s Method

Subset 3 Ward’s Method

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

ZREVENUE 1.61787 -0.27738 -0.15669 0.28409 9.68240 127.35973

ZPROFIT 1.67043 -0.29111 -0.15252 0.28549 9.11473 130.10660

ZMONTHS -0.69649 0.92380 -0.88315 0.61350 -0.81577 0.14551

ZORDERS 2.39766 -0.36093 -0.17674 0.38067 4.87055 1.55949

ZQUANTITY 2.24517 -0.33771 -0.20573 0.40711 8.45841 71.20998

Page 26: Identifying Customer Segments Through K-Means Cluster

25

Table 4 Subset 3- Furthest Neighbor Method

Subset 3 Furthest Method

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

ZREVENUE -0.05878 7.40703 2.39440 10.86609 21.18797 35.45373 127.35973

ZPROFIT -0.05976 7.37253 2.50112 10.68498 15.20536 32.29339 130.10660

ZMONTHS 0.01557 -0.80631 -1.01619 -1.47462 -1.13989 1.59158 0.14551

ZORDERS -0.06972 3.39641 5.81495 17.88302 4.92021 -0.36093 1.55949

ZQUANTITY -0.05564 7.65237 3.19972 15.80979 8.12778 5.60449 71.20998

Table 5 Subset 4- Ward’s Method

Subset 4 Ward’s Method

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

ZREVENUE -0.22654 -0.18702 0.38868 1.03357 3.75694 8.39797 32.34450

ZPROFIT -0.24046 -0.18219 0.38853 1.11095 3.77223 8.29956 30.50664

ZMONTHS 0.89809 -0.86659 0.45040 -0.78704 -0.68087 -0.90557 -1.54157

ZORDERS -0.36093 -0.25660 0.61904 1.97579 4.26777 8.70770 13.08198

ZQUANTITY -0.29427 -0.21981 0.39961 1.60970 3.64912 9.49222 26.12723

Table 6 Subset 4- Furthest Neighbor Method

Subset 4 Furthest Method

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

ZREVENUE -0.04825 7.65181 5.41134 9.59474 32.34450 13.97667

ZPROFIT -0.04634 7.62888 5.17289 9.58430 30.50664 12.20633

ZMONTHS 0.01684 -0.78088 -0.11902 -0.99260 -1.54157 -1.42107

ZORDERS -0.02678 7.26072 0.78664 15.32246 13.08198 -0.36093

ZQUANTITY -0.04029 8.92682 1.52620 15.24906 26.12723 -0.45140

Page 27: Identifying Customer Segments Through K-Means Cluster

26

Appendices B:

The K-Means tables using initial seeds from Subset 2, Subset 3 and Subset 4.

Table 1 The 5-Cluster K-Means Solution (Subset 2-Ward’s)

K-means Cluster ( Subset 2: Ward’s)

1 2 3 4 5

Mean Mean Mean Mean Mean

REVENUE 535.99 126.17 129.05 1784.40 15364.78

PROFIT 288.89 66.44 71.09 931.61 7752.80

MONTHS 29 62 19 24 20

ORDERS 3 1 1 6 18

QUANTITY 6.33 1.67 1.68 17.54 89.45

COUNT 9907 46872 41866 1241 11

Table 2 The 5-Cluster K-Means Solution (Subset 2: Furthest)

K-means Cluster ( Subset 2: Furthest)

1 2 3 4 5

Mean Mean Mean Mean Mean

REVENUE 129.79 626.74 136.07 2303.81 38622.38

PROFIT 68.35 337.02 74.92 1189.97 20278.65

MONTHS 62 29 19 22 37

ORDERS 1 3 1 7 4

QUANTITY 1.71 7.26 1.76 21.67 165.00

COUNT 47513 8360 43313 709 2

Table 3 The 6-Cluster K-Means Solution (Subset 3: Ward’s)

K-means Cluster (Subset 3: Ward’s)

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

REVENUE 1159.33 121.35 121.61 450.87 3717.28 38622.38

PROFIT 616.43 63.90 67.01 243.31 1880.86 20278.65

MONTHS 25 62 19 31 21 37

ORDERS 5 1 1 3 10 4

QUANTITY 12.67 1.62 1.60 5.34 31.64 165.00

COUNT 2277 45947 40218 11250 203 2

Page 28: Identifying Customer Segments Through K-Means Cluster

27

Table 4 The 7-Cluster K-Means Solution (Subset 3: Furthest)

K-means Cluster (Subset 3: Furthest)

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38

PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65

MONTHS 62 32 19 26 22 24 37

ORDERS 1 2 1 4 7 12 4

QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00

COUNT 45246 11835 39091 3119 562 42 2

Table 5 The 7-Cluster K-Means Solution (Subset 4: Ward’s)

K-means Cluster (Subset 4: Ward’s)

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 117.65 115.97 392.91 864.41 2053.97 6807.42 38622.38

PROFIT 61.92 63.90 212.27 463.10 1073.45 3342.76 20278.65

MONTHS 62 19 32 26 22 23 37

ORDERS 1 1 2 4 7 12 4

QUANTITY 1.57 1.53 4.70 9.78 20.98 43.11 165.00

COUNT 45072 38818 12054 3305 602 44 2

Table 6 The 6-Cluster K-Means Solution (Subset 4: Furthest)

K-means Cluster (Subset 4: Furthest)

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

REVENUE 122.83 474.71 123.84 4562.60 38622.38 1294.70

PROFIT 64.68 256.23 68.23 2306.44 20278.65 684.10

MONTHS 62 30 19 21 37 24

ORDERS 1 3 1 10 4 5 QUANTITY 1.64 5.61 1.62 35.57 165.00 13.96

COUNT 46255 10878 40715 121 2 1926

Page 29: Identifying Customer Segments Through K-Means Cluster

28

Appendices C: Final Four Solutions

Table 1 K-Means Solution 1

Solution 1

1 2 3 4 5 6

Mean Mean Mean Mean Mean Mean

REVENUE 1159.33 450.87 121.35 121.61 3717.28 38622.38 PROFIT 616.43 243.31 63.90 67.01 1880.86 20278.65

MONTHS 25 31 62 19 21 37

ORDERS 5 3 1 1 10 4 QUANTITY 12.67 5.34 1.62 1.60 31.64 165.00 COUNT 2277 11250 45947 40218 203 2

Table 2 K-Means Solution 2

Solution 2

1 2 3 4 5

Mean Mean Mean Mean Mean

REVENUE 535.99 126.17 129.05 1784.40 15364.78

PROFIT 288.89 66.44 71.09 931.61 7752.80

MONTHS 29 62 19 24 20 ORDERS 3 1 1 6 18 QUANTITY 6.33 1.67 1.68 17.54 89.45

COUNT 9907 46872 41866 1241 11

Table 3 K-Means Solution 3

Solution 3

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 118.32 400.67 117.09 891.64 2114.33 6937.52 38622.38

PROFIT 62.28 216.55 64.49 477.20 1103.71 3406.11 20278.65

MONTHS 62 32 19 26 22 24 37

ORDERS 1 2 1 4 7 12 4

QUANTITY 1.58 4.79 1.54 10.08 21.41 42.95 165.00

COUNT 45246 11835 39091 3119 562 42 2

Page 30: Identifying Customer Segments Through K-Means Cluster

29

Table 4 K-Means Solution 4

Solution 4

1 2 3 4 5 6 7

Mean Mean Mean Mean Mean Mean Mean

REVENUE 117.65 115.97 392.91 864.41 2053.97 6807.42 38622.38

PROFIT 61.92 63.90 212.27 463.10 1073.45 3342.76 20278.65

MONTH_NO 62 19 32 26 22 23 37

ORDER_NO 1 1 2 4 7 12 4

QUANTITY 1.57 1.53 4.70 9.78 20.98 43.11 165.00

COUNT 45072 38818 12054 3305 602 44 2

Page 31: Identifying Customer Segments Through K-Means Cluster

30

Appendix D: Geographic Analysis

Figure 1. The Map of Segment 7

Table1 The Demographic Information of the Final K-Means Segmentation

Final

Cluster

Population in

Zip Code

Mean

Income

Median

Income

Average Age

in Zip Code

Female

percent

White

percent

1 26,358.66 $ 92,212.70 $ 71,484.54 39.423 51.1% 78.6%

2 25,806.89 $ 91,291.07 $ 70,331.39 39.784 51.1% 78.6%

3 25,956.50 $ 89,054.38 $ 69,230.06 39.484 51.1% 78.4%

4 25,474.86 $ 92,658.61 $ 70,819.11 39.727 51.1% 78.1%

5 24,874.73 $ 98,961.14 $ 74,112.55 40.260 51.2% 77.7%

6 26,831.28 $ 99,722.23 $ 74,611.73 39.799 51.6% 78.6%

7 22,711.00 $ 66,574.44 $ 49,861.57 40.105 51.9% 84.3%

Total 26,099.66 $ 90,915.90 $ 70,456.50 39.504 51.1% 78.5%

Page 32: Identifying Customer Segments Through K-Means Cluster

GAP1

2

3

4

5

6

7

“Lost

“hibernating Customers”

“NewCustomers”

ATTENTION”THAT NEED

“Customers

“POTENTIALLOYALIST”

“CHAMPIONCustomers”

“POTENTIALLOYALIST”

“CHAMPIONCustomers”

Customers”

PROFIT

62.28

216.55

64.49

477.2

1103.7

3406.1

20278.7

MonthsQuantity

62

32

19

26

22

24

37

1 - 2

10

4 - 5

21 - 22

42 - 43

165

1 - 2

of items Recommendations

membershipPremium

& ExclusiveOffers

ignore

membershipwith offers

take feedback

send promos

promote appusage

customer segments

Increase salesmanagerial Goal:

payment

method ofPreferred

purchaseor

webin-store

method ofPreferred

paymentvisa