(spvqjohipufmsftubvsboudvtupnfstcbtfepobcfibwjpsbmtdpsjoh ...ijthr.or.kr/xml/12087/12087.pdf ·...

Ⅰ. Introduction

Segmentation, Targeting and Positioning, what we

called STP, are most important key to successful marketing

strategy in hospitality industry. This means dividing the

characteristics of various customers into similar groups and

implementing a marketing strategy tailored to the needs and

desire of customers. For the first step, STP marketer should

identify various customer's group (Kotler, Bowen, &

Makens, 2017). Marketers believe that they should be

aware of customer segments that can make them more

satisfied than their competitors, provide them with direct

* Professor, College of Hospitality and Tourism, Sejong University,

e-mail: [email protected]

† (Corresponding author) Professor, Sogang Business School, Sogang

University, e-mail: [email protected]

marketing activities, and then provide products or services

that can capture target segments. The most carefully

selected step for this strategic approach is to choose which

segments the company will focus on. Many hospitality

companies use several methods for this approach. It is more

important to know which method to use for this purpose

(Bowen, 2000). For example, while male business people

often enjoy hotels with restaurants such as bars and clubs,

families or housewives may prefer hotels that include large

restaurants or bakeries. Therefore, knowing who can be

satisfied with the products and services offered by

companies is a beginning and a necessary step in hospitality

business.

Prior to 1950, direct marketers used ‘mail orders’ to

accomplish mass marketing. The purpose of mass marketing

in the traditional approach was to reach a larger number of

International Journal of Tourism and Hospitality ResearchVolume 31, Number 10, pp. 85-97, 2017 ISSN(Print): 1738-3005Homepage: http://www.ktra.or.kr DOI: http://dx.doi.org/10.21298/IJTHR.2017.10.31.10.85

Grouping hotel restaurant customers based on a behavioral scoring model

: An exploratory study

Yukyeong Chong*⋅Gunhee Lee†7)

College of Hospitality & Tourism, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul 05006, Republic of Kore

Sogang Business School, Sogang University, 35 Baekbeom-ro, Mapo-gu, Seoul 04107, Republic of Kore

AbstractSegmentation, targeting, and positioning are the most important keys to a successful marketing strategy in the

hospitality industry. Among these three keys, segmentation is the first step for a marketer to identify various customer needs and desires. Hospitality operators have been trying to increase customer satisfaction and corporate profits by utilizing mass marketing, database marketing, and individual marketing. Despite the increased interest in scoring consumer behavior, applications of the score remain difficult. The lack of understanding and utilization of scores has been an important issue in the hospitality industry. Analysis of customer behavior is not an easy problem to solve because dynamic modeling is required due to changes to customers’ records over time. The current study explores customer data in hotel restaurants and proposes an individual behavior scoring model (BSM) based on the traditional RFM (recency, frequency, monetary) concept. By comparing it with the traditional profiling scoring model (PSM), it is shown that BSM provides a high prediction power of future consumers' behavior. However, PSM has an important role in a complementary sense to identify potential customers who have low behavior scores. This research proposes how to build and validate BSM and PSM with a focus on the utilization of the two models to identify future potential customers efficiently.

Key words: Segmentation, Customer scoring, Behavior scoring model (BSM), Profiling scoring model (PSM), RFM measure

86 Grouping hotel restaurant customers based on a behavioral scoring model

customers and to reach a wider customer base. The traditional

mass marketing processes have been challenged by

one-to-one marketing of new approaches (Rygielski, Wang,

& Yen, 2002). Although the purpose of direct marketing or

mass marketing has not been changed, the current issues have

changed to refer to a practice of database or relationship

marketing that emphasizes individual customer and focuses

on customers’ needs and wants (Petrison, Blattberg, & Wang,

1993). In other words, the recent marketing approach to

improve the satisfaction of one-on-one individual customers

is to build a deep relationship by filling each individual

customer's needs rather than a wide customer base. A deep

relationship with customers can be achieved through a more

customized approach that utilizes individual customer data.

Database marketing, sometimes is called integrated

marketing, relationship marketing, or even maxi-marketing.

Regardless of the names, all techniques seek to build

customer’s behavior information (Nash, 2000). To

accomplish sound relationship with customers, scoring

techniques based on historical transaction data are important

to differentiate customers to develop relationship marketing

strategies. The most common scoring method is to sort the

customers from those who are profitable to those who are not.

Typical customer data available in this case are recency,

frequency, and monetary data (Miglautsch, 2000). The more

advanced form of customer data is the customer's transaction

data.

The current research is exploratory study to investigate

customer’s transaction data in hotel restaurants and

checking possibility of applications in future customer’s

behavior and aims to provide a view of scoring modeling in

the context of the hospitality industry. In particular,

predicted expenditure estimates were used to assign a score

to each individual. The score proposed several scoring

techniques and suggested segmentation of customers based

on the scores. This paper is divided into three sections. First,

traditional relationship marketing concepts and several

scoring techniques are reviewed. In the next section,

empirical study of behavior scoring models (BSM) and

profiling scoring model (PSM) are conducted with

investigating prediction power and customer segmentation.

Conclusions that can provide new approach for customer

behavior quantification are finally addressed.

Ⅱ. Literature review

It is important to understand data-driven relationship

marketing. In general, four aspects of relationship marketing

should be considered: statistical model produced by

quantitative analysis, customer’s information collected at the

individual level, design of linkage between analytic results

and marketing activities to increase the effectiveness of

customer contact, and time and efforts to make relationship

building (Roberts, 1992). Well-designed sets of customer’s

historical records that can track historical pattern of buying

products or services are required to use of scientific statistical

methods to support relationship marketers to keep strong

relationships. Identifying profitable customers to expand

relationship with customers is vital. Also building strong

relationships with loyal customers is the key reason for

marketing activities (Berson, Smith, & Thearling, 1999).

There are recent researches dealing with customer’s

behavior score. One of the researches is to identify profitable

customer based on RFM (Recency, Frequency, and

Monetary) behavior using SOM (Self-Organizing Map)

technique in u-Commerce industry (Cho, Moon, & Ryu,

2014). In this research, customer’s behavior score is proven

to be recommendation service effectively. Another

interesting research can be found on bank industry. The

research proposed customer behavior score using mobile

banking transaction history and break into six groups to

attract and maintain customers with keeping high

customer’s satisfaction (Noori, 2015).

In the hospitality industry, data-driven marketing has

been emphasized as an important marketing issue lately.

Integrated data sets and analytical techniques that are being

used by hospitality marketers have been stressed to discover

the answer to set customized service to a customer (Dev,

Buschman, & Bowen, 2010). In three decades ago, building

a customer-database for micro-marketing in a hotel has been

practiced and showed far exceeded financial performance

(Francese & Renaghan, 1990). However, as Dev et al. (2010)

described, marketing communication has been changed as

International Journal of Tourism and Hospitality Research 31(10), 2017 87

mobile marketing technologies appears. Individualized

personal attention, incentives, and recognition have

specially been important factors to cultivate brand loyalty in

a hotel business (Francese & Renaghan, 1990). Building

customer loyalty is an essential factor in creating

relationships (Bowen & Shoemaker, 1998; Dube &

Renaghan, 1999) and customer information is weighty in

building such relationships between the hospitality business

and the customer.

Researchers in around early the ’90s were aware of how

important frequent guest programs are in the hotel and

airline industries (McCleary & Weaver, 1991; Toh, Rivers,

& Withiam, 1991; Tou & Hu, 1988). Frequent guest profiles,

that are demographic and psychographic characteristics of

the frequent customers, used pivotal sources to develop

marketing strategies and target promotions in the restaurant

business (Wilbourn, McCleary, & Phadeesuparit, 1997).

Bowen (1990) indicated that using information available

through existing databases prepares managers to be

competitive in a radically changing industry, especially

when such valuable customer information was provided

through effective loyalty programs. People in management

level, including hotelier or restaurateurs, should practically

use databases for strategic purposes and not just for tactical

focuses (Bowen, 2000).

Database-driven marketing approaches to customer

relationship management (CRM) (Berson et al., 1999) can

be used interchangeably with relationship marketing, or

one-to-one marketing (Peppers et al., 1999). A database is a

prerequisite of the CRM, which requires managerial

philosophy that allows a company to become familiar with

its customers. Also the CRM needs to work with data-driven

activities in order to run CRM system effectively (Piccoli,

O'Connor, Capaccioli & Alvarez, 2003). The major

attraction of data mining is its capability to build predictive

rather than retrospective models (Shmueli, Bruce, & Patel,

2016). In other words, data mining uses well-established

statistical and machine learning techniques to build models

that predict customer behavior and helps marketing users to

target marketing campaigns more accurately. It also aligns

campaigns more closely with the needs, wants, and attitudes

of customers and prospects. Therefore, data mining aims to

create models for decision-making that predict future

behavior based on analyses of past activity (Berson et al.,

1999; Magnini, Honeycutt, & Hodge, 2003). The ultimate

goal of direct marketing, database marketing, CRM, and

data mining is to differentiate customers, that is to say who

are and will be the profitable valuable customers that a

company has to try to have strong relationship with although

the technical terms are phrased differently (Berson et al.,

1999).

To make better decisions and identify more profitable

customers, direct marketers have been aware of both

relationship strength and relationship quality (Schijns &

Schröder, 1996). Relationship strength has been frequently

measured by behavioral or descriptive indicators (e.g.,

RFM) that can easily be captured in a database. Those

behavioral differentiations, transaction information, have

been used as segmentation variables among different

customer groups (Fader, Hardie, & Lee, 2005; Sarvari,

Ustundag, & Takci, 2016). It is important to discriminate

against the worst customers to provide a customized service

to the best customer. One way to determine who will be

receiving an upcoming marketing personal contact, such as

telephone calls and emails, by predicting likelihood to

respond or expect sales from a perspective customer is

predictive scoring models (Schijns & Schröder, 1996).

By using data from a single piece of previous contact

information, recency, frequency, and monetary value,

scoring models can predict future revenue, and these

predictions are scores (Malthous & Derenthal, 2008). RFM

code or customer-lifetime-value (CLV) has been studied to

obtain the answers to quantify customer behaviors

(Miglautsch, 2000; Borle, Singh, & Jain, 2008). According

to Hughes’ calculation (1996), customers are broken down

by frequency (e.g., number of visits a store) and frequency

is categorized into five-quintile groups. Customers who

visit the store many times are much more likely to visit

again than those who seldom visit. Customers are also

grouped by their monetary value. Similarly, a quintile

categorization of the customers by how much a customer

spent can be used. After customers are broken down by

recency, frequency, and monetary value, each customer will

be assigned a three-digit RFM code. Since each RFM code


is constructed with a three-digit five quintile number, the

total possible combination of values for the three-digit code

is 125. That is, the RFM code is a single cell from 125

possible cells such as 555, 554,…, 445, 444, …, 355, 354,

…, 113, 112, and 111. The RFM code is not an ordinal scale

that has the property of order but a nominal scale that is

assigned for the sole purpose of differentiating one object

from another. The RFM code itself by nature, thus, could

not be treated as a score that has an order of high and low

(Qiasi et al., 2012).

Miglautsch (2000) discussed two common RFM scoring

methods: customer quintile scoring and behavior quintile

scoring. In the customer quintile scoring, customers are

sorted by descending order and broken into five equal

groups using their RFM information to generate 125 equal

sized segments. On the other hand, the behavior quintile

scoring method uses the monetary score that would

generate an equal amount of sales in each quintile instead of

using an equal number of individual in each group as used in

the customer quintile scoring. However these scoring

methods still remain to define each individual cell such as

435 or 233 and fail to score to individual customer in each

cell. He discussed different weighing methods to convert

RFM value to a single score by adding up three actual

numbers, adding three RFM codes, and multiplying certain

numbers by each RFM value.

In Rhee and McIntyre's study (2008), marketing firm's

contact-efforts was considered to be the essential variable in

scoring modeling. Such an approach is recognizable in some

industries; however, the contact-efforts of the promotion

campaign would largely be determined by which customer

is valuable in hospitality operations. There will be

considerable variation in scoring methods with subjectivity,

leaving aside whether the scoring methods are right or wrong.

Another type of information, in addition to the behavioral

transaction data, is customers' demographic data that can be

used for understanding the current market situation. Sheth

(1977) criticized using demographic factors as determinants

or correlates of consumption behavior of consumers due to

the lack of relevance of the factors and poor prediction etc.

However, many researchers have used demographic profiles

in various academic fields. For example, there is research

with some topics include: the correlation demographic

variables with consumer alienation in the marketplace

(Lambert, 1981), the effect of demographic variables of

modeling for determining segment membership using panel

data (Gupta & Chintagunta, 1994), the influence of

demographic characteristics over consumers' decision on

usage frequency in the bank industry using adoption theory

(Branca, 2008), the role and the effect of demographic and

socioeconomic variables on travel choice (Kattiyapornpong

& Miller, 2008), how demographic profiles affect

consumers' on-line shopping behavior (Hashim, Ghani, &

Said, 2009), and a good many others. Although marketers

need much more information to comprehend customers'

behavior in marketplaces other than demographic variables,

demographic profiles serve a basic, yet important role in

interpreting the characteristics of clusters, groups, or

segments of customers (Yeh, Plante, & Agrawal, 2011).

In studies that especially use customer data, demographic

profiles are essential in research, yet in the most part the usage

of the information is mostly limited to descriptive analysis.

Demographic profile could be used far more than describing

group-characteristics. They could also be used as the same

vehicles as RFM in scoring individual customer. This paper

introduces how to build one-dimensional scoring model

(BSM) reflecting customers' longitudinal behavior data and

scoring model (PSM) based on demographic profiles. These

scoring models are to differentiate segments of customers and

predict customers’ future contribution to a restaurant. In the

next empirical study section, after describing data and

cleaning process, the monetary value analyses is given based

on the expenditures and number of visits. Next part includes

the comparison analysis between BSM and PSM in terms of

model fitting and prediction power. Finally, the customers are

distinguished by predicted expenditure estimates for future

contribution.

Ⅲ. Methodology

1. Data description

This research uses customer data from a hotel in Seoul,


Korea. The hotel is globally-franchised five-star hotel and at

that time of the research operated 10 different restaurant

outlets; Italian Restaurant, Lounge, Bakery and Beverage,

Banquet, Club, French Restaurant, Bar, Chinese Restaurant,

Japanese Restaurant, and Buffet when we obtained the data

set. The hotel accumulates the types of restaurants that a

customer visits, the gender and occupation of a customer, what

time (month) a customer visits a restaurant, and how much

money a customer spends each visit to a restaurant. Most of

the customers who have a membership reside domestically,

so tourists are not included.

Out of the hotel restaurant customers who have

memberships, 959 customers information (11,466

transactions) was collected to identify customers' behavior.

The data for the current research includes longitudinal

information that may provide a customer's behavior pattern

instead of using cross-sectional data that explains only a

single transaction. To facilitate analysis, the individual

expenditure data has been transformed to an average

expenditure per month. The frequency of visits to each

restaurant and the purchasing expenditures are obtained.

The nature of the frequency of restaurant visits is a discrete

variable and the monetary value is a continuous variable.

Gender and occupation are the demographic variables

available for this study.

2. Data cleaning process

Data cleaning is the next step after gathering data and

refers to a process of removal of noise, errors, and incorrect

input from a database (Adriaans & Zantinge, 1996). These

are inevitable problems that analysts encounter as they

begin to use a new data set. To some degree, any database

system may have inconsistent, incomplete, or erroneous

data. As much as 80 percent of the time associated with the

data mining process will be spent dealing with these

problems (Westphal & Blaxton, 1998). In this study, some

fields, such as birth date, contain very little customer data,

while other fields, such as joining date, have no data

recorded at all, although there was a field for it. After

removal of non-usable fields during the discovery state,

gender, and occupation are selected as usable demographic

variables. Frequency of restaurant visits and the expenditure

of 959 customers are also collected.

For the scoring modeling analysis at the end, the

customers who did not indicate their occupation were

removed. Because this unidentified group might be

included in any other occupation group, this group of

customers was not considered for the study. 30 customers'

data were deleted due to the missing value of occupation

(23cases) and unusual transactions(7cases). Finally, 929

customers were selected for further analysis. Secondly, 340

customers (identified as a dormant group) visited less than

four times during the 12-month period and were not

included in the next analysis. In addition, no frequency,

which means that a customer has not visited any restaurant

in this hotel in a certain month, is transformed to ‘zero’

rather than treated as a missing value. Therefore, 589 active

customers who had four or more visits during the 12-month

observation period were used for further analysis. Table 1

presents the proportion of removed, active, and dormant

customers in this study.

3. Analysis

The monetary value is defined as how much a customer

spends during a specified time interval. Unlike frequency

that represents the number of visits, monetary value can be

treated as a continuous random value. There are two types

of analytic models in this study: behavioral scoring model

(BSM) and profiling scoring model (PSM). Both BSM and

PSM provide individual customer score which is equivalent

with the expected expenditure of a customer for the next

month (December in this case). In the BSM, 589 individual

scoring models are estimated while one aggregated scoring

model is used in the PSM based on the past 12-month

Data group(N=959) Active customers Dormant customers Removed customersFrequency(%) 589(61.4%) 340(35.5%) 30(3.1%)

Table 1. Distribution of active, dormant, and removed customers


transaction data. All statistical analyses of data were

performed using the SAS(Statistical Analysis System).

In the BSM for expenditure analysis, 589 regression

models are employed based on the transaction data during

11-month (January through November). The regression

models for customers are as follows: yt = + 1(time) + εt,

where yt indicates expenditures per restaurant visit per month.

In this model, and 1 represent an intercept and a slope for

changes of yt over time (11 months), respectively. The εt,

represents individual variability treated as random error,

assuming that the mean equals zero and constant variance is

2.

Means and standard deviations of slopes and intercepts

for each gender are shown in Table 2. The negative mean

value of slope for females implies that the expenditures per

restaurant visit of females decrease during the period from

January through November, while the positive mean value

of slope for males indicates increased expenditures over

time. High standard deviations of slopes and intercepts

indicate that large variability exists among individuals.

Also, the average value of the intercept for males is higher

than that for females. It is concluded that male customers

spend more money than female customers at restaurants in

this hotel, with expenditures increasing over time.

However, due to large amounts of variability among

individuals, these differences of slope(p=0.1765) and

intercept(p=0.1219) between males and females are not

statistically significant with 5% significant level.

Figure 1 presents average values of slope and intercept

estimates for food purchases by occupation. The slopes of four

of the occupations--housewives, doctors, business owners,

and professors--are located below zero, implying that

expenditure per restaurant visit decrease from January

through November. The slopes of four other occupations―

government officers, lawyers, presidents/chairmen, and

businessmen―are located above zero.

Tables 3 summarizes the comparison of mean values of

slope and intercept estimates. Although slopes and intercepts

Note: Each position represents averages of estimates by each occupation

Figure 1. Means of slope and intercept of expenditure by occupation

Parameter estimate (N=589) Male(N=445) Female(N=144)Intercept(Mean±SD)

Slope(Mean±SD)62.31±84.490.35±10.33

52.50±73.19-0.86±7.41

Table 2. Mean differences of intercept and slope of expenditures between male and female


for each occupation are shown to be different, due to the large

subject variability, there is no statistical significance among

occupations (p-value=0.8951 for intercept; p-value=0.8275

for slope) at 5% significant level.

In the PSM, 589 customers with eleven months of data are

used for building a predictive PSM model. At first, analysis

of variance(ANOVA) model with two factors, gender and

occupation, and one time covariate was used for the analysis

of expenditures including two interaction effects: (time× gender) and (gender×occupation). Since the ANOVA model

shows no significance of two interaction effects with 5%

significant level, we consider only main effects of ANOVA

model without interaction. Therefore, the final PSM is as yt

= + 1(time) + εt+ 2 (gender) + j (occupation)i + εt, where

yt indicates expenditures per month, a represents an intercept

and b1 represents a slope for changes of yt over time (11

months). The term(occupation)i represents seven dummy

variables. The et represents individual variability treated as

random error with the assumption that the mean equals zero

and constant variance is 2. According to the summary of final

PSM presented in Table 4, main effects of gender and

occupation are statistically significant at 5% significance

level. The results confirm that gender and occupation play in

major role for building PSM model. The PSM is used in the

comparison of model fitting and prediction power of BSM.

Ⅳ. Results

1. Model assessment and validation

The performances of the individual behavior models and

the aggregating profiling model are evaluated in two ways,

model fitting and assessment of prediction power on a test

data set(December data). The validation of the model is a

way to evaluate how good the model is at predicting the data

set. The validation process is important because the results

of data mining are often used for strategic issues throughout

an organization. In data mining, there is a danger of

over-fitting the model. That is, it is possible that the model

can be highly predictive for a training set but can be less

efficient with data not used in building the model (Groth,

1998). Therefore, the model validation process required for

data mining is that after building the model on some

historical data, the model can be applied to similar historical

data from which the model was not built (Berson et al.,

1999).

For the training and test method, the entire data set is

divided into two data sets: a training set and a test set (or

holdout sample). After the model is fitted using the training

data set, the test set is applied to evaluate the model. In using

the training and test method, it is known that the results of

Occupation (N=589)Parameter estimates

intercept (Mean ± SD)Parameter estimatesslope (Mean ± SD)

Businessmen (n=128) 59.43 ± 81.95 0.18 ± 0.58Housewives (n=108) 49.88 ± 72.48 -0.65 ± 7.32

Doctors (n=20) 63.99 ± 53.54 -1.08 ± 6.57Business owners (n=6) 56.14 ± 86.05 -3.63 ± 7.99

Government officers (n=2) 21.33 ± 20.17 4.02 ± 6.91Presidents / Chairmen (n=296) 63.87 ± 88.37 0.42 ± 10.39

Lawyers (n=17) 55.46 ± 62.34 1.16 ± 9.43Professors (n=12) 63.61 ± 69.08 -2.55 ± 7.18

Table 3. Means and standard deviation of slope and intercept of expenditure by occupations

Source df Sum of squares Mean squares F value p valueMonth 1 200.70 200.70 0.03 0.8552Gender 1 58859.15 58859.15 9.77 0.0018**

Occupation 7 139520.39 19931.48 3.31 0.0016**Note: **p<0.05

Table 4. Analysis of variance of PSM


model assessment are sensitive to splitting up a small data

set. To overcome this problem, Malthouse & Derenthal

(2008) recommends stratified sampling to reduce the

variation across the splits. In cross validation, one case is

excluded from the original sample, and the model is trained

based on the remaining sample. Then the trained model

predicts the excluded case. This procedure is repeated for

each case. The accuracy of each case is summed over the

entire sample. The cross validation method may provide

nearly unbiased estimators of the prediction accuracy

(Sung, Chang, & Lee, 1999).

Figure 2 illustrates how to evaluate model fitting and

prediction power in this study. We used the data from

January through November to build predictive models of

both BSM and PSM. Using the predictive model, the

performance of December is predicted and compared to the

actual value. In this case, the data for 11 months acts as a

training set and the rest of the data in December works as a

test set.

Assessment of model fitting is performed using deviance

and Pearson’s chi-square for frequency of restaurant visits.

The deviance and Pearson’s chi-square provide goodness-

of-fit measures indicating discrepancy between actual

frequency and predicted frequency generated from the

predictive model using the training set. For assessment of

monetary model fitting, root MSE(mean square error), R2,

and adjusted R2 are measured as goodness of fit measures.

Prediction power of models for frequency of restaurant visits

and monetary value are evaluated using MAE(mean absolute

error), MSE.

2. Assessment prediction power

Three statistics, RMSE(root mean square error), R2, and

adjusted R2, are employed in the assessment of model

fitting between the BSM and the PSM. Table 5 shows that

the BSM generate smaller RMSE, larger R2 and adjusted R2

than the PSM does. Therefore, the BSM outperform to the

PSM. Prediction power of the two models is investigated in

the next phase to detect potential over fitting as well as

validation for the individual models.

Two estimated models based on eleven months of data

were used to predict the expenditures per restaurant visit in

December to assess prediction power. Each predicted value

of expenditure is compared with the actual expenditure per

restaurant visit in December. The results are displayed in

Model RMSE R-square Adjusted R-squarePSM 77.64 0.012 0.011BSM 38.15 ± 35.65 0.28 ± 0.25 0.19 ± 0.28

Table 5. Assessment of model fitting between PSM and BSM

Figure 2. An example of model fitting and prediction power


Table 6, 7, and 8. Table 6 shows that the BSM provide

similar patterns of expenditures to the actual value of

expenditures in December. The correlation coefficient

between the true value and the predicted value in the

BSM(0.5904) is higher than the one in the PSM (0.1516).

Table 7 provides mean values and standard deviations of

MAE and MSE for the PSM and BSM. The BSM

outperform the PSM in prediction, providing lower mean

values of MAE and MSE. Table 8 summarizes the

distributions of MAE and MSE in both models. Five

number summary statistics indicate that the BSM is

superior to the PSM.

3. Customer segmentation by predictive expenditures

Market segmentation describes the division of a market

into homogeneous groups, which will respond differently to

promotions, communications, advertising, and other

marketing mix variables. Direct marketers want to get away

from mass-marketing campaigns and use a more consumer-

oriented approach. This is done based on the behaviors

exhibited by the customers, such as using similar services

and products(Westphal & Blaxton, 1998). Segmenting

techniques look for similarities and differences within a data

set and group similar rows together into segments or clusters.

It is supposed that there are high similarities within a

segment and high differences between segments. There have

been two traditional approaches to specifying market

segments. The first one is to classify customers by objective

variables such as sex, age, life cycle stage and personality.

The second approach is based on the segments of

situation-specific events, such as purchases and users of

specific products, brand-loyal versus non-brand-loyal users,

attitude toward the brand, etc. (Frank, Massy, & Wind 1972).

The optimal number of segments is a subject of continuous

research, although many approaches to segmentation allow

the user to decide the number of segments (Groth, 1998).

The customers are first scored by predictive estimates of

expected expenditure for the next month. Segmentation is

performed using the scores with three groups: high

value(top 25% scored customers), middle value(between

top 25% and 50%), and low value(below 50%). The

distribution of the 589 scores is summarized in Table 9. The

average score is 153.04, implying that the expected

expenditures for the next month(December in this case) of

589 active customers is $153.04. Table 9 also shows that the

distribution of scores is highly skewed to the right, with a

Variable N Mean ± SD Correlation coefficient1

Predicted value by PSM 589 65.55 ± 8.69 0.1516 (p-value = 0.0002)Predicted value by BSM 589 62.90 ± 74.50 0.5904 (p-value < 0.0001)

True Value 589 68.91 ± 86.94 -Note: 1indicates correlation between predicted value and true value

Table 6. Mean and standard deviation of predicted and true value

Model N MAE ± SD MSE ± SDPSM 589 60.26 ± 61.95 7462.39 ± 23933.84BSM 589 44.50 ± 59.27 5487.17 ± 16670.38

Table 7. Evaluation of prediction power with December

Five number summaryModel Maximum 75% (Q3) 50% (Q2) 25% (Q1) Minimum

MAEPSM 591.64 67.57 46.48 28.03 0.65BSM 468.84 59.45 24.65 6.56 0

MSEPSM 350042.89 4545.63 2160.67 785.58 0.42BSM 21980.95 3534.84 607.47 43.05 0

Table 8. Five number summary of prediction power with December


few extremely high scores. It is interesting to note that about

16% of customers can be treated as high value customers

spending at least $150 for the next month.

In fact, 68% of dormant customers and low value

customers can be referred to as customer groups that hardly

contribute to sales spending less than $35 for the next

month. As shown in Figure 3, the scores are validated

through the relationship of segmentations with the average

expenditures per visit and restaurant visits per month. It is

certain that high value customers have high average

expenditures per visit and number of visit per month.

Ⅴ. Conclusions

The purpose of this research was to provide the efficient

usage of customers' historical transaction data with scoring

model within the context of the hotel restaurants. The

purchasing history can be sources for BSM, while

demographic information such as gender and occupation

can be important factors of PSM. Unlike traditional

behavior score such as RFM measure, we proposed

behavior score defined as predictive expenditure for the

next month. The score includes all historical information

with emphasis on recent transactions. It is easy to

understand because the score itself means expenditure. In

particular, the proposed behavior score is powerful index to

predict existing customers' future behavior.

In BSM, past transactions of a customer during 11 months

can be summarized by intercept and slope on a regression

model. Customers with high intercepts with negative slopes

indicate that the customers are leaving on the given time

Figure 3. Relationship of segmentations with expenditures per visit and the number of visits

Mean ± SD Maximum 75% (Q3) 50% (Q2) 25% (Q1) Minimum153.04 ± 355.30 4466.24 149.48 35.07 3.16 0

Table 9. Distribution of customer score by purchasing pattern (Unit: US dollar)

Dormant customers

(340 cases) (36.60%)

High value customers

(148 cases) (15.93%)

Middle value customers

(154 cases) (16.58%)

Low value customers

(287 cases) (30.89%)

(Total N=929)


period. Therefore churn analysis is required for further

understandings. If customers have medium or high intercept

with positive slope, cross-selling or up-selling promotion

campaigns might be appropriate to increase their

expenditure. Figure 1 and Table 3 illustrate averages of

slopes and intercepts for each occupation. High standard

deviations of both intercept and slope are detected due to

large variability of individual customers within the same

occupation. Such variability of customers' behavior affects

poor performance of prediction in PSM. It is natural to say

that behavior scores from BSM have high prediction power.

However, there are several limitations in BSM study.

Firstly, handling personally identifiable data in the process

of analyzing individual behavior is a very important issue of

privacy. In order to comply with personal privacy protection

and privacy laws, all personal identifiable information was

deleted in the process of data handling. Since the members

of the restaurant being studied are of a certain class of

customers, it is decided not to mention the name of the

restaurant to prevent from the possibility of personal

identification. And it is decide to limit the use of personal

behavior data for research purposes only. Therefore, in this

study, we would like to mention the limitation that the source

of the data cannot be disclosed in detail.

The second limitation is that BSM cannot be applied to

new customers who do not have historical transaction data.

In other words, BSM is only applicable to existing

customers. Lastly, it cannot identify potential customers in

low value segment. In this case, profile score rather than

behavior score plays an important role to overcome these

difficulties. For example, according to the results of Figure

1 and Table 3, the occupations of government officers,

lawyers, presidents/chairmen, and businessmen have high

profile scores so that we can promote these groups of people

as new or potential customers. Although we competitively

compare the prediction power of BSM and PSM in this

study, the PSM will be an excellent complement to the BSM

in distinguishing customers. In the management of new and

existing customers, marketers should consider how to

combine BSM based on the individual transaction data and

PSM based on the aggregated demographic data efficiently

as a powerful tool to understand customers and implement

strategies. In practice, BSM can be used to identify and

maintain loyal customer group avoiding churning.

However, BSM has difficulty in application of new

customer with no historical behavior data. In this case, PSM

is useful tool to identify potential customers that had poor

historical records in past. Therefore, promotion or

up-selling campaign might be applied to make them valued

customers.

References

Adriaans, P., & Zantinge, D. (1996). Data mining. New York, NY:

Addison-Wesley.

Berson, A., Smith, S., & Thearling, K. (1999.) Building data mining applications for CRM. New York, NY: McGraw-Hill.

Borle, S., Singh, S. S., & Jain, D. C. (2008). Customer lifetime

value measurement. Management Science, 54(1), 100-112

Bowen, J. T. (1990). Electronic information: Scanning the

environment. Hospitality Research Journal. Annual Conference Proceedings, 14(2), 95-101.

Bowen, J. T., & Shoemaker. S., (1998). Loyalty: A strategic

commitment. Cornell Hotel and Restaurant Administration Quarterly, 39(1), 12-25.

Bowen, J. T. (2000). A strategic approach to capturing and using

customer information. Journal of Restaurant and Foodservice Marketing, 4(1), 77-81.

Branca, A. S. (2008). Demographic influences on behavior: An update

to the adoption of bank delivery channels. International Journal of Bank Marketing, 26(4), 237-259.

Cho, Y. S., Moon, S. C., & Ryu, K. H. (2014). SOM Clustering

method using user’s features to classify profitable customer

for recommender service in u-Commerce. In Park, J.J.,

Pan, Y., Kim, C., & Yang, Y.(Ed.), Future Information Technology, 273-281, Springer, Dordrecht.

Dev, C. S., Buschman, J. D., & Bowen, J. T. (2010). Hospitality

marketing: a retrospective analysis (1960-2010) and

predictions(2010-2020). Cornell Hospitality Quarterly,

51(4), 459-469.

Dube, L., & Renaghan, L. M. (1999). Building customer loyalty.

Cornell Hotel and Restaurant Administration Quarterly,

40(5), 78-88.

Fader, P. S., Hardie, B. G., & Lee, K. L. (2005). RFM and CLV:


Using iso-value curves for customer base analysis. Journal of Marketing Research, 42(4), 415-430.

Francese, P. A. & Renaghan, L. M. (1990). Data-base marketing:

building customer profiles. Cornell Hotel and Restaurant Administration Quarterly May, 31(1), 60-63.

Frank, R. E., Massy, W. F., & Wind, Y. (1972). Market segmentation.

Englewood Cliffs, NJ: Prentice Hall.

Groth, R. (1998). Data mining: A hand on approach for business professionals. Upper Saddle River, NJ: Prentice Hall.

Gupta, S., & Chintagunta, P. K. (1994). On using demographic

variables to determine segment membership in logit mixture

models. Journal of Marketing Research, 31(1), 128-136.

Hashim, A. E. Ghani, K., & Said, J. (2009). Does consumers’

demographic profile influence online shopping? : An

examination using Fishbein's theory. Canadian Social Science, 5(6), 19-31.

Hughes, A. M. (1996). The compile database marketer. New York,

NY: McGraw-Hill.

Kattiyapornpong, U., & Miller, K. E. (2008). Socio-demographic

constrains to travel behavior. International Journal of Culture, Tourism and Hospitality Research, 3(3), 246-258.

Kotler, P. T., Bowen, J. T., Makens, J. (2017). Marketing for hospitality and tourism (7th Edition). England: Pearson Education.

Lambert, Z. V. (1981). Profiling demographic characteristics of

alienated consumers. Journal of Business Research, 9(1),

65-86.

Magnini, V. P., Honeycutt, Jr. E. D., & Hodge, S. K. (2003). Data

mining for hotel firms: Use and limitations. Cornell Hotel and Restaurant Administration Quarterly, 44(2), 94-105.

Malthous, E. C. & Derenthal, K. M. (2008). Improving predictive

scoring models through model aggregation. Journal of Interactive Marketing, 22(3), 51-68.

McCleary, K. W., & Weaver, P. A. (1991). Are frequent-guest program

effective? Cornell Hotel and Restaurant Quarterly, 32(2),

39-45.

Miglautsch, J. R. (2000). Thoughts on RFM scoring. Journal of Database Marketing, 8(1), 67-72 .

Nash, E. (2000). Direct marketing: Strategy, planning, execution

(4th Ed.). New York, NY: McGraw-Hill.

Noori, B. (2015). An analysis of mobile banking user behavior

customer segmentation. International Journal of Global Business, 8(2), 55-64.

Peppers, D., Rogers, M., & Dorf, B. (1999). Is your company ready

for one-to-one marketing? Harvard Business Review,

151-160.

Petrison, L. A., Blattberg, R. C., & Wang, P. (1993). Database

marketing - past, present, and future. Journal of Direct Marketing, 7(3), 27-43.

Piccoli, G., O'Connor, P., Capaccioli, C., & Alvarez, R. (2003).

Customer relationship management-A driver for change

in the structure of the U.S. lodging industry. Cornell Hotel and Restaurant Administration Quarterly, 44(4), 61-73.

Qiasi, R, Baqeri, D. M., Minaei, M. B., & Amooee, G. (2012).

Developing a model for measuring customer’s loyalty and

value with RFM technique and clustering algorithms. The Journal of Mathematics and Computer Science, 4(2),

172-181.

Rhee, S., & McIntyre, S. (2008). Including the effects of prior and

recent contact effort in a customer scoring model for

database marketing. Journal of the Academy of Marketing Science, 36(4), 538-51.

Roberts, M. L. (1992). Expanding the role of the direct marketing

database. Journal of Direct Marketing, 6(2), 51-60.

Rygielski, C., Wang, J. C., & Yen, D. C.(2002). Data mining techniques

for customer relationship management. Technology in Society, 24(4), 483-502

Sarvari, P. A., Ustundag, A., & Takci, H. (2016) Performance

evaluation of different customer segmentation approaches

based on RFM and demographics analysis. Kybernetes, 45(7), 1129-1157.

Schijns, J. M. C., & Schröder, G. J. (1996). Segment selection by

relationship strength. Journal of Direct Marketing, 10(3),

69-79.

Sheth, J. N. (1977). Demographics in consumer behavior. Journal of Business Research, 5(2), 129-138.

Shmueli, G., Bruce, P. C. & Patel, N. R. (2016). Data mining for business analytics: Concepts, techniques, and applications with XL miner. New York, NY; John Wiley and Sons.

Sung, T. K., Chang, N., & Lee. G. (1999). Dynamic of modeling

in data mining: Interpretive approach to bankruptcy

prediction. Journal of Management Information Systems, 16(1), 63-85.

Toh, R. S., Rivers, M-J., & Withiam. G. (1991). Frequent-guest

programs: Do they fly? Cornell Hotel and Restaurant Quarterly, 32(2), 46-52.

Toh, R. S., & Hu, M. Y. (1988). Frequent-flier programs: Passenger


attributes and attitudes. Transportation Journal, 28(2),

11-22.

Westphal, C., & Blaxton, T. (1998). Data mining solutions: Methods and tools for solving real-world problems. New York,

NY; John Wiley and Sons.

Wilbourn, L. C., McCleary, K. W., & Phadeesuparit, A. (1997).

Demographic and psychographic determinants of coupon

users at pizza restaurants. Journal of Restaurant and

Foodservice Marketing, 2(1), 45-61.

Yeh, R. S., Plante, R. D., & Agrawal, D. (2011). Consumer data

analysis and its managerial application for the grocery

industry. Journal of Promotion Management, 17(1), 96-113.

Received March 10, 2016Revised September 4, 2017

Accepted September 18, 2017

(spvqjohipufmsftubvsboudvtupnfstcbtfepobcfibwjpsbmtdpsjoh ...ijthr.or.kr/xml/12087/12087.pdf ·...

Documents