data mining ii, c.a. brebbia & n.f.f. ebecken (editors ... · the divergence of data mining...

15
Data Mining forDatabase Marketing at Garanti Bank Omer Faruk Alls', Ertan Karakurt' & Piero Melli^ 'Garanti Technology, Turkey. Abstract This paper summarises the data mining applications implemented on Garanti Bank's customer database. There has been an upsurge of interest in using data mining applications for converting historical data into actionable business information and this has become well justified in Garanti's example. Garanti bank carried out two major applications that will be presented in this paper: Customer segmentation and database scoring for marketing purposes. A well- maintained central datawarehouse made it possible to prepare the data in the most efficient manner for the applications. Customer segmentation was carried out with the demographic algorithm, while database scoring was implemented with various classification and predictive modelling techniques: Decision trees, neural networks and radial-basis-functions were used to score the database. A business unit at the bank (Customer Relationship Management) exploited these scores for marketing purposes: Several pilot campaigns were launched and the response rates obtained in these campaigns were highly satisfactory. Encouraged by the promise of the initial data mining applications, the bank has decided to pursue a more aggressive marketing strategy supported with the results of the analysis. 1 Introduction The last two decades have brought together three drivers of what is called today knowledge discovery in databases (or data mining as a subset of it): 1) Advances in hardware; ability to process large amounts of data inexpensively, 2) Collection, storage and management of large amounts of data, 3) Advances in data analysis algorithms. The convergence of these three drivers created this resurgence of interest in a set of technologies named as data mining. Another important factor contributing to the popularity of data mining in the business Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Upload: others

Post on 16-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining for Database Marketing at

Garanti Bank

Omer Faruk Alls', Ertan Karakurt' & Piero Melli^

'Garanti Technology, Turkey.

Abstract

This paper summarises the data mining applications implemented on GarantiBank's customer database. There has been an upsurge of interest in using datamining applications for converting historical data into actionable businessinformation and this has become well justified in Garanti's example. Garantibank carried out two major applications that will be presented in this paper:Customer segmentation and database scoring for marketing purposes. A well-maintained central datawarehouse made it possible to prepare the data in themost efficient manner for the applications. Customer segmentation was carriedout with the demographic algorithm, while database scoring was implementedwith various classification and predictive modelling techniques: Decision trees,neural networks and radial-basis-functions were used to score the database. Abusiness unit at the bank (Customer Relationship Management) exploited thesescores for marketing purposes: Several pilot campaigns were launched and theresponse rates obtained in these campaigns were highly satisfactory. Encouragedby the promise of the initial data mining applications, the bank has decided topursue a more aggressive marketing strategy supported with the results of theanalysis.

1 Introduction

The last two decades have brought together three drivers of what is called todayknowledge discovery in databases (or data mining as a subset of it): 1) Advancesin hardware; ability to process large amounts of data inexpensively, 2)Collection, storage and management of large amounts of data, 3) Advances indata analysis algorithms. The convergence of these three drivers created thisresurgence of interest in a set of technologies named as data mining. Anotherimportant factor contributing to the popularity of data mining in the business

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 2: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

04 Data Mining II

world is the emergence of a collective awareness: Business data containsactionable information that needs to be imburied to create competitive advantage.

The classical definition of data mining from a business perspective is "siftingthrough large volumes of databases to extract valid, authentic and actionableinformation" [1]. Information should be valid to be of any value; it should beauthentic since any information already known does not add value; and though itmay be valid and authentic it may not be useful enough to take any action.

The divergence of data mining from statistical data analysis is seen in a fewfundamental respects [2]. The first difference concerns scale: Data mining deals

with fairly large data sets (1 OP to 1 (f samples in banking are typical) whereas aconsiderable part of statistics deal with small sample characteristics of a dataset.The second difference is that statistical algorithms are more hypotheses basedthan the data mining algorithms. Some algorithms of data mining do not fit intostatistical framework as in the case of demographic algorithm, included inIntelligent Miner [3], the data mining software package developed by the IBMCompany. This clustering algorithm, proprietary to IBM, was inspired by avoting system proposed in the 18* century by Marquis de Condorcet. Neuralnetwork algorithms (although they are a family of semi-parametric regressiontechniques) have their own jargon. Most of the statistical algorithms work onnumerical datasets, however there are many important categorical fields in abusiness database carrying valuable information.

Data mining algorithms can be broadly described as classical statisticalalgorithms, artificial neural networks, multivariate-function-approximationalgorithms (radial-basis-functions, multivariate splines), pattern recognitionalgorithms (k-means clustering algorithm, IBM Condorcet algorithm, etc.),genetic algorithms, classification and regression trees (CART) and other rule-based methods. Different algorithms can be exploited to achieve the same task;e.g. neural networks and CARTs can both be used to classify a dataset intodistinct patterns. This abundance of competing algorithms is often useful inselecting a best model for the specific application.

A business data mining project can be decomposed into three stages: 1)Identifying the business problem and defining the business requirements, 2) Datapreparation, 3) Model building, and 4) Interpretation of the results. In our ownexperience, we have seen that a healthy collaboration between business peopleand data mining team is essential for a successful data-mining project. Theirhypotheses about a problem should be consulted before delving into themachinery of a data mining application.

Data preparation stage consists of data selection, data cleansing and datatransformation steps [4]. Data selection consists of selecting the right data forthe specific application. For example, lifestyle segmentation uses mainlydemographic variables, whereas behavioural segmentation heavily uses historicaltransaction data to segment the customer database. Data cleansing is animportant and often neglected part of data preparation step. In our ownexperience, we have observed that transactional or billing data is quite clean(since this directly affects how business is done), however demographics databarely reaches that quality. Data transformation step consists of creating new

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 3: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II or

variables meaningful for the analysis. Summarising the transactional data intomonthly, quarterly or yearly dimensions is an example. Although some datamining algorithms are immune to the statistical distribution of the fields includedin the dataset, some algorithms do their best when the input distribution isnormal or symmetrical. A discretisation scheme or numerical transformation ofthe numeric variables can create well-distributed input variables for thealgorithms.

Model building consists of feeding the relevant input data into an algorithm tocrunch out results. The result of a data-mining algorithm takes different formsdepending on the application. Interpretation of the results is the stage wheremining results are mapped into business results. A cluster obtained as the resultof a segmentation model is simply a bunch of samples with a certain multivariatedistribution in its fields. Summarising the cluster in business terms andadjectives is the first step to take business action.

In this paper, two standard applications of data mining in banking industrywill be illustrated: Behavioural segmentation and cross selling to existingcustomers. In Section 2, we describe the data used for the analysis. TheDatawarehouse at Garanti bank is managed by the IBM DB2 RelationalDatabase Management System on an IBM mainframe running under IBM'sOS/390 operating system; the same machine is used as an application server forthe data mining work. We used IBM's Intelligent Miner data mining softwarepackage tool to carry out different mining tasks. Intelligent Miner can carry outthe following tasks [3]: Clustering: Two different clustering algorithms can beused for segmenting the database; a) demographic algorithm, an algorithmproprietary to IBM which accepts both categorical and continuous fields asinputs, but gives very accurate businesswise results, when appropriatelydiscretised input fields are fed into the algorithm, b) neural networks clusteringalgorithm (employing a Kohonen self-organised-feature-map neural network).The other tasks include classification with CARTs and neural networks,prediction with neural networks and radial-basis-functions, findingassociations, linear and logistic regression, and various data processing tasks(filtering records, getting a random sample, transforming fields to create newones, running SQL queries, etc.).

Section 3 presents the behavioural segmentation that we carried out on theretail-banking customer database by using the demographic algorithm. Clustercharacteristics are summarised in Section 3. In Section 4, we describe thepredictive models built on the clusters that were obtained as the result of theclustering explained in Section 3. The aim of models is to select the bestcustomer group in a cluster for selling a certain product (in our case, fourproducts were chosen for the initial study: credit card, investment products as abundle, overdraft account and loans). Previously, marketing strategy at the bankwas a mass marketing strategy, i.e. it was based on random sampling of thecustomer database and sending mails or using call centre to sell the product.Predictive models we present in Section 4 serves this purpose. Concludingremarks and future perspectives are given in Section 5.

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 4: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

96 Data Mining II

2 Data

The data used for the analysis was brought under a single database table.Throughout the rest of the paper, we shall call this table "analytical datamart".Analytical datamart was gathered from 29 separate tables residing in the GarantiBank datawarehouse, which is currently refreshed weekly. The analyticaldatamart was obtained from the data that belonged to the 12-month period ofJanuary 1999 through December 1999. Analytical datamart was pivoted tocustomer-identifier field in order to have a full customer-centric view. Theanalysis was carried out for the retail-banking customers of the bank. 1,150,879retail-banking customers were active as of Dec. 31 1999 (activity is defined ashaving at least one product, hence it does not mean usage activity). Each recordin the datamart corresponds to a retail-banking customer and each fieldrepresents an attribute of the customer. Raw fields are taken directly from thetables in the datawarehouse. Marital-status or product-holding data are exampleof raw fields. Derived fields are calculated from the raw fields to better reflectthe behaviour of the customer. The analytical datamart consists of four distinctcategories of data:1) Demographics: Age, marital status, education level, etc.2) Product holding: status of ownership of several product categories (current

account, investments, loans, etc.)3) Product usage data (number of transactions, amounts, etc.)4) Global variables: total activity, profit, cost, risk, and segmentation data.

Demographic fields were pretty clean though for some fields 30 % to 40 % ofthe fields were missing. Since the operation of the bank depends on thetransactional data, it is believed to be clean. However, there were someinconsistencies that were corrected. Product holding data was drawn from thedatawarehouse as of the date Dec. 31 1999. In Table 1 below, we present the listof products modelled in the analytical datamart along with the usage rates byGaranti Bank's customers:

X\< %jVf<fcftiKJ&EJYi' ACCOUN iTERM DEPOSITDEBIT CARDCREDIT CARDINVESTMENT PRODUCTSOVERDRAFTLOANSINSURANCEUTILITY BILL PAYMENTINTERNET & PHONE BANKING

't)%ll'\ "r \ W§A :j i;& ^ ;%:\&z.$&%13.66%56.74%12.75%4.50%2.33%4.73%1.33%2.11%11.49%

TABLE 1: Product categories

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 5: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II 07

Current account serves as the medium where money is credited to and debitedfrom. In most banks, every customer should have a current account but atGaranti Bank, for some products (like term deposit) it is possible to open anaccount without having a current account. Overseas accounts were not modelledsince very few customers used it. The recent years have witnessed high inflationand high interest rates in Turkey and banks have concentrated on sales of passiveproducts rather than active products, i.e. they acted more like a borrower ratherthan a lender. This explains the low densities in active products (loans,overdraft). Credit card business is very recent in Turkey and market has notsaturated yet. With the falling inflation and interest rates, there will be muchemphasis in selling active products and early 2000 have already witnessedincreases in sales of these products. Insurance is a recent product sold mainlythrough a sister company of the bank: this explains the low density reportedabove. Utility bill payment was included since there has been a high focus fromthe Bank's marketing on this typical loyalty product. The same focus applies toInternet banking and phone banking that represent innovative low-cost channels,to which the bank wants to lead their customers.

3 Customer segmentation

Garanti bank had a previous segmentation of its retail banking customersaccording to sum of their monthly balances in various accounts. Thissegmentation lacked the ability to model the behaviour of the customer with thebank since account balances represent only a single dimension contributing tooverall transactional behaviour of the customer. Besides this, it does not reflectany demographic information about the customer. The bank needed a morecomplete view of the customer reflecting several dimensions of the customerbehaviour. There are two distinct methods to implement customer segmentationwith either user-defined partitioning and subsequent extraction of clusters bySQL queries or usage of an automatic clustering algorithm. In the latter, the dataspeaks for itself, thus producing the number and type of clusters that are optimalaccording to some specified criterion: We preferred this approach to segment thecustomer database. Twelve months data of 1999 was used for the clustering. Thespecific clustering algorithm we used is IBM's demographic clusteringalgorithm included in the IBM Intelligent Miner package. Demographicclustering provides fast and natural clustering of very large databases. Similaritybetween a pair of records is determined by comparing their field values; everycoincidence increases similarity whereas different values for a field decreasesimilarity. The clusters are defined so that Condorcet criterion [3] below ismaximised:

(Sum of all record similarities of pairs in the same cluster)-(Sum of allrecord similarities in different clusters)

This algorithm requires the binning of numerical variables into categories.We discretised numerical inputs by inspecting their univariate distributions anddefined the bins accordingly. Inactive customers were pre-filtered prior toclustering. An inactive customer was defined as one who has no activity in his

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 6: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

98 Data Mining II

current account during 1999 and has no other product as well. The number ofinactive customers obeying this criterion was 544,897. These customers were notincluded as input to the algorithm and were classified as Inactive.

The demographic clustering algorithm is an unsupervised algorithm and itrequires the number of clusters as an input parameter. Obtaining the optimumnumber of clusters is an iterative process and optimality criterion in our case isinterpretability from a business perspective. The initial run with unrestrictednumber of clusters produced seven big clusters and less than 1% of the customerbase was contained in the remaining clusters. A second run with the number ofclusters set to seven produced the final clustering.

We named the clusters according to their most relevant characteristics. Thisnaming is especially useful in communicating results with marketing people atthe bank. Table 2 shows the name of each cluster and their percentage in thepopulation.

Advanced Investors Transactors with large volumes 5.97%Needies Customers interested in active products 6.13%Credit Card Users Exclusively credit cards 15.72%Debit Card Users Current account transactors 32.82%Old StyleTD Users Traditional customers using only term deposit 18.14%New Investors Recent investment products users 16.54%

TABLE 2: Clusters

A brief summary of the characteristics of the clusters is given below with theappropriate cross-selling strategy specified for each cluster. Condorcet index [3]is also given for each cluster, which reflects the purity or homogeneity of eachcluster. Condorcet index for a categorical field in a cluster is given by thequantity:

where 71 represents the distribution of each category of a field in the specific

cluster. Overall Condorcet index for a cluster is the arithmetic average of theCondorcet index for each field used. A Condorcet index of 1 indicates thatcluster is entirely homogenous, i.e. each record in the cluster has exactly thesame values in its fields. Condorcet indexes of 0.5 to 0.8 are realistically goodvalues in practice.Top-Top: The most valuable customer group in terms of the profit-per-customer,total amount of money they bring to the bank, high number of transactions aswell as high balance on most products. They have very complex product usage;they own many products and they have high activity in using these products. All

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 7: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II on

customers in this cluster have term deposit account. A significant percentageuses investment products (32 % as opposed to 4.5 % population average). 43 %of the clusters use credit cards. Half of the credit card users have gold card. Thetotal amount of money they bring to the bank is the highest. The average Time-as-customer is 3.26 (2.3 is the average). Condorcet index for the cluster is 0.56.Targeted products for cross selling: Investment products, consumer loans(mostly house and car loans), overdraft account, credit cards and insurance.Promotion to Gold card for selected customers, increases in credit card limits.Possible promotion to portfolio for selected customers (25.3 % of the customersare not in the portfolio).Advanced Investors: Another group of elite customers. The main distinctionwith the top-top is that they don't use term-deposit and they have higher numberof transactions. They use many products with high amount of activity determinedby the number of transactions and balances. The amount of money they bring isonly second to Top-Top. Nobody in this cluster has term deposit account.Almost half of the cluster (48 % as opposed to 4.5 % population average) has atleast one investment product. 63 % uses credit cards and 44 % of credit cardusers have gold cards. This cluster is also characterised by much activity incurrent account (as determined by the number and amount of debit + credittransactions). Condorcet index for the cluster is 0.60. Targeted products forcross selling: Investment products, credit cards, overdraft account, insuranceand loans. Gold card promotion and increases in credit card limits. Possiblepromotion to portfolio for selected customers (22.2 % of the customers are not inthe portfolio).Old Style Term-Deposit Users: Customers in this cluster have only termdeposit accounts. Although they have current accounts and debit cards, there isvery little activity on the current account. These customers are very traditionaland avoid using plastics (debit card and credit card). A significant portion isretired people who deposit their retirement compensations into term-depositaccounts (portfolio managers made this fact known to us). Customers in thiscluster are not very active in terms of transactions but they bring significantamount of money to the bank and since they do not have many transactions,transaction costs per customer in this cluster is very low. Usage of other productsis negligible. Overall Condorcet index is 0.78 which indicates that this cluster isvery homogeneous. Targeted products for cross selling: No product is targetedhere since customers are very traditional. Since there is no product penetrationother than term-deposit there is not enough historical data to build predictivemodels. However, there is potential for investment products in low interest-rateera and mass marketing campaigns can be designed to sell investment productsin this cluster. These campaigns can supply new insight into the buyingbehaviour of this segment.Needles: As the name implies, customers in this cluster are interested mostly inthe active products of the bank, i.e. they use the bank to borrow money only.There is no usage of term deposit account, investment products. 94 % of thecustomers use credit cards and 60 % of the customers have done at least onerollover in 1999. 25 % of the customers have overdraft accounts as opposed to

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 8: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II

2.33 % in the overall population. 29 % of the customers have their salaries paidthrough the bank. In summary, this cluster is characterised by above-averageusage of credit cards, overdraft, loans and insurance in that order. Targetedproducts for cross selling: There is potential for credit card, overdraft andconsumer loan campaigns. Gold card promotion to selected customers andincreases in credit card limits can be attempted to increase the revenues in creditcards spending.Debit Card Users: All customers in this group have debit cards and it is themain product they use to manage their money in their current accounts. There arethree distinct groups of customers (with overlap of course) in this cluster. 27 %are paid their salary through the Garanti bank and they use debit cards towithdraw it. 20 % of the clusters borrowed very small loans in 1998 and 1999(there was a campaign run jointly with a cellular phone company and amountsowed did not exceed USD 500). The third group (rest of the cluster) uses thebank to keep their money in foreign currency and occasionally withdraw it to useelsewhere (there has been 70 %-80 % yearly devaluation of Turkish lira againstUS dollar and Deutsche Mark). Condorcet index for the cluster is 0.76. Targetedproducts for cross selling: We decided to target this cluster for loan promotion,due to the fact that there is a noticeable usage of this product in the cluster.Credit Card Users: Customers in this cluster show a predominant usage ofcredit cards. 75 % uses credit cards and 30 % of the credit card users have goldcards. 13 % of the cluster uses term deposit account but the balances they keep islow. Current account activity is low. 48 % of the customers have rolled overtheir credit card balances at least once in 1999. Overall Condorcet index is 0.78.Targeted products for cross selling: There is potential for credit card and loancampaigns. Promotion to Gold card for qualified customers and increases incredit card limits are candidate strategies to increase the credit card revenue.Overall Condorcet index is 0.73.New Investors: Customers who have joined the bank mostly in the last twoyears and using investment products or keeping their money in foreign currencycurrent account. They bring relatively high amount of money to the bank.Targeted products for cross selling: There is potential for campaigninginvestment products.

4 Building predictive models for targeted cross-selling

Clustering partitions customer database into homogeneous subgroups, howeverthis differentiation is at a coarser level. For an effective targeted cross selling, wewould like to know more about the individual customer. Computing thepropensities of each customer to buy a certain product is the first step fortargeted cross selling. Ideally one needs previous campaign data (did buy / didnot buy split) to build the models. Unfortunately, there was no campaign dataavailable and we had to train the models by using historical data. Historical datareflects the customers' usage of several products and customers can be classifiedinto categories in terms of the usage of a certain product. For example, we can

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 9: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II i Q i

classify credit card users into revolvers, transactors, and convenience userscategories.

The ansatz we make in building the predictive models is that these categories(defined for each product) can be predicted to a certain accuracy by the amountof activity in other products, demographics of the customer and global variables(risk, revenue etc.). For example, before building any predictive models topredict the likelihood of a customer to buy overdraft account, we can expect thatthis likelihood will be correlated with the credit card activity or loan activitysince these variables might indicate the need of the customer for the money; andone would like to sell the overdraft account to customers who needs money andwho can pay it without defaulting.

The methodology of building predictive models can be summarised in thefollowing steps [3][5]:1) Clustering the database into homogeneous sub-groups. Building predictive

models in each cluster is more accurate than the predictive models built onthe whole dataset. The machinery of this step has been explained in Section3.

2) For each model, cluster is separated into two disjoint sets: learning datasetand application dataset. The learning dataset is also separated into twosamples:i) Training sample: Several models are built using the training

sample. This refers to learning phase of predictive modelling. Inour case, the aim is to construct a multivariate relationship betweena set of input variables and an output (response) variable.

ii) Validation sample: Data contains two components: Information andnoise. While learning a model, the best one can do is to extract theinformation and filtering the noise. However, models usually learnnoise as well. A model is said not to generalise well when it doesnot do a good job of predicting new outcomes. There are tworeasons for this phenomenon. Either the model is too simple tolearn the structure inside the data, or it is too complex and learnsalso the noise in the data (this is also called overfitting problem).Validation sample is used to find the best model among severalcompeting models with various complexities. The complexity ofthe model is determined by the different parameters depending onthe algorithm, e.g., the depth of the tree and the minimum leafand/or node size in a classification tree constitutes the complexityof the algorithm.

We usually keep a 70%-30% split for the training sample and validation sample,respectively.3) The models are built using the data in training sample and the models are

validated using the data in the validation sample. The best model is the onewith the best generalisation as obtained on the validation sample. The bestmodel is then applied to new records to predict new outcomes. The outcomemay be the propensity of new customers to buy a product, or the expectedexpenditure on credit card, or the likelihood of leaving the bank and depends

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 10: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

102 Data Mining II

on the application. The "goodness" of a model can be evaluated via aperformance criterion. RMS (root-mean-square) of the error vector(computed on the validation sample) can be used as the accuracy of themodel if the aim is to predict a numeric variable. If the model is used for aclassification problem, the misclassification rate or Gini index or entropyindex (computed on the validation sample) can be used to measure theaccuracy of the model. From a marketing point of view, the steepness of thelift chart for the model is another criterion for the "goodness" of the model.

We have built predictive models for four products chosen by customerrelationship management (CRM) team at the bank: loans, credit cards, overdraftaccount and investment products. Since there was not enough penetration ofproducts in some clusters (not enough training samples), we have not builtmodels in these clusters. In additions, customers in some clusters are not thoughtto be high-value customers and although these clusters had enough productpenetration we have not built models for them. The cross sign in each cell of thefollowing product-by-cluster matrix refers to a model created for thecorresponding product and cluster. We have not built models for empty cells.

OVERDRAFTCREDIT CARDINVESTMENTLOAN

TOP-TOP

vr -X«^

,.;&:#;#r \ r:%

% \:#K

ADV.ENV.,, <sii r» J£\ , \. !, ; s>-; 6%«r&s%\Xi% , \ v \ vMPfS Af

&##

NEWENV.

W#Wv:

NEEDIES; V\ ) »'k." T <4 o *

: ;?f'

::p\\:%;.

CC USERS

% <:;\

DBCUSRS

-Ws#<%

TABLE 3: Products vs. clusters

Since product penetration and customer profile is very high in the clustersTop-Top and Advanced investors, we computed models for both of theseclusters in every product category. New investors are not interested in credit cardand overdraft, however there are enough training samples for the investmentproducts and loans. As Needies are interested mostly in active products, we havecomputed models for the cross selling application of these products. Since thecluster of credit card users is exclusively interested in credit cards, we planned tosell credit card to the customers who still do not have it. There is a significantpenetration of consumer loans in the cluster of debit card users (though themajority of them are small amounts). CRM unit has decided to sell loans forqualified customers in this cluster.As it was pointed out before, the aim of our study is to provide a ranking ofcustomers in each cluster for each product mentioned. Customers at the top ofeach list are interpreted as the best candidates for a cross selling campaign. Wenow explain the ranking variable for each product (this variable is also theresponse or output variable of each model).OVERDRAFT: The dataset for the overdraft users consisted of overdraftowners who owned the account as of Dec. 31 1999. We classified overdraft usersinto two classes: 1) Customers with overdraft revenue of more than USD 20 inthe last four months of 1999 (revenue data was available only for the last four

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 11: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II103

months of 1999), and 2) Customers with overdraft revenue of less than USD 20in the same period. We have oversampled the data to create almost equal ratio ofthese two classes and we composed the training sample with this procedure.Classification trees and neural classification algorithms were used to buildseveral models on the training sample. We manually pruned the classificationtrees (Intelligent Miner allows this) to select the model with the bestperformance in the validation dataset (we have considered the misclassificationrate as the performance index).

For the neural networks, we tried with different network architectures (asdetermined by the number of hidden layers and number of nodes in each layer)to select the model with the best performance on the validation sample. Thoughin some clusters neural networks did a better job of classification (lowermisclassification rate than classification trees), we have chosen thecorresponding trees as the final model. The reason for choosing classificationtrees is their transparency: The mapping between the inputs and the output isexpressed in terms of rules, and it is easier for business user to understand thecorrelations hidden inside the data. On the contrary, neural networks are close tobeing black boxes with the relationship between the inputs and the output beingopaque.

The models, as applied to validation sample, classify the samples into eitherof the two classes with a certain confidence. The marketing strategy here is topredict the prospects with high overdraft revenue and selling the product to thistargeted segment. In Table 4 below, we present the final classification treemodel, its depth, leaf and node sizes of the trees with the misclassification rateobtained on validation samples for each cluster.

CLUSTER

Top-Top

Advanced Inv.

Needles

DEPTH

8

10

8

NODE SIZE

50

50

100

MISC. RATE

25.00%

21.00%

18.00%

TABLE 4: Tree model for overdraft revenue prediction

CREDIT CARD: The dataset for the credit card users consisted of credit cardowners as of Dec. 31 1999. We have discussed with CRM team the quantity topredict for the credit card holders. Although revolvers (who roll over theirbalances) are by far the most profitable group of the credit card customers, theyhave chosen another variable to predict, which is not entirely correlated with thisquality. We have built models predicting the average monthly expenditures spenton all credit cards the customer has. We have split the dataset into training andvalidation samples (70%-30% split), and have built several models on thetraining sample. Neural networks and radial-basis-functions were used to buildmodels. For each of the algorithms we have selected the best models asdetermined by their performance on the validation dataset. The specific

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 12: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

104 Data Mining II

performance index we have considered is the root-mean-squared-error (RMSE)value divided by four standard deviations in the predicted quantity [3]. Thisindex is defined so that the RMSE value is comparable across differentpredictions. The complexity of the neural networks in this case is determined bythe number of hidden layers, number of nodes in each layer, maximum numberof passes through the data, and complexity of the radial-basis-functions isdetermined by the maximum number of nodes, maximum region size andmaximum number of passes through the data.

As the final prediction, we took a linear combination of the quantities aspredicted by neural networks and radial-basis-functions. In Table 5, for eachcluster, we present the normalised RMSE value for the prediction of expectedaverage-monthly-spending on credit cards. The marketing strategy here is to rankthe prospects according to predicted quantity, and to target the customers at thetop of the list.

CLUSTER

Top-Top

Advanced Inv.

Needles

CC Users

Normalised RMSE

0.12

0.14

0.15

0.12

TABLE 5: Normalised error for credit card models

LOANS: The dataset for loan users consisted of the loan owners who owned thecontract as of Dec. 31 1999. We classified loan users into two classes: 1)Customers whose contract amount exceeds USD 5000 and 2) Customers whosecontract amount is less than USD 5000. This split was discussed with CRM teamand was found to be sound from a business point of view. We have oversampledthe data to create equal ratio of these two classes and we composed the trainingsample with this oversampling procedure. As in building overdraft models,classification trees and neural classification algorithms were used to buildseveral models on the training sample. We then manually pruned theclassification trees to select the model with the best performance in thevalidation dataset. For the neural networks, we tried with different networkarchitectures to select the best model. We have again preferred classification-treemodels as final models for their interpretability. In Table 7, we present the depth,minimum node size, and the misclassification rates of the final classification treemodels in each cluster. The target list for loan campaigns will consist ofcustomers who are predicted to belong to first class defined above.

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 13: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II 105

CLUSTER

Top-Top

Advanced Inv.

Needles

New Investors

Debit Card Users

DEPTH

8

8

7

8

8

NODE SIZE

25

25

50

50

100

MISC. RATE

18%

20%

21%

27%

20%

TABLE 6: Tree models for loans

INVESTMENT PRODUCTS: The dataset for the investment product usersconsisted of the customers who have any one of the investment products(treasury bonds and bills, mutual funds and REPOs) as of Dec. 31 1999. Afterdiscussion with CRM team, we have decided to predict the overall sum of themonthly balances in investment products. We have again split the dataset intotraining and validation samples (70%-30% split), and have built several modelson the training sample. Neural networks and radial-basis-functions were used tobuild models. For each of the algorithms we have selected the best models andtook a linear combination of the results these produced to obtain a finalprediction. In Table 7 below, we present the normalised RMSE value of thepredictions in each cluster for which we have computed the expected average-monthly-balance.

CLUSTER

Top-Top

Advanced Inv.

New Investors

Normalised RMSE

0.11

0.15

0.09

TABLE 7: Normalised error for investment-products models

Lastly, we present the response rates obtained on two pilot campaigns thatused data mining results. Mutual funds campaign was implemented at tenbranches, and customers were chosen according to their ranking which isdetermined by their expected monthly average balances. 1120 customers werecontacted this way, and 424 of them bought the product, indicating a responserate of 37.9 % (mutual fund usage rate at the bank is less than 4.5 %). Creditcard campaign was carried out at four branches. 98 customers were contacted,and 60 of them bought the product, corresponding to a response rate of 61 %(credit card usage rate at the bank is 12.75 %).

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 14: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining U

5 Conclusion

In this paper, we presented two standard applications of data mining as it wasapplied to Garanti Bank's retail-banking customer database: Behaviouralsegmentation and cross selling of four banking products (overdraft account,credit card, loans, investment products) to existing customers. We summarisedeach stage of the project: Data preparation, analysis, and interpretation of theresults. We have believed at the outset that there had to be strong teamworkbetween analysis team and business people at the bank (CRM team) in order forthe project to attain success. Contribution of business people (domain experts) toa business data-mining project can not be overstated. Most often, they knowwhich input fields are likely to affect a certain output, or as in the segmentationstudy, they have a priori understanding of the possible customer sub-groups.Their contribution is most valuable at the interpretation of the results, since at theend of the project, marketing department of the bank will use the results.The lists (and the rankings associated with them) we produced to the businesspeople will be next used in a pilot project where there will be a marketing effortfor the customers who have internet banking and phone banking products.Rolling out a campaign to these classes of customers is relatively easy since it isnot needed to reach the customer via a portfolio manager. A pop-up windowopening when the customer access the bank's webpages, or banners embeddedinside the webpages will attract the interest of the customer. The next stage isrollout of this effort through all the channels of the bank. Branch people will bein charge of calling the prospective customers in their list and try to sell pre-chosen product to the customer according to his/her ranking in the list. Theinfrastructure of all these processes including contact management and campaignmanagement is under development and a full launching is expected to take placein year 2000.Another ongoing data mining project is the attrition project, i.e. understandingwhy a customer leaves the bank and designing an early warning system toprevent attrition. This project entails understanding the time-series-behaviour ofthe customer rather than looking at his behaviour as a snapshot view.Initial reaction to the data mining philosophy at the bank was far more optimisticthan the analysis team had forecasted. There is strong executive sponsorship forfuture data mining applications. In a broader context, executives are very eagerto exploit the benefits of business intelligence projects.

Acknowledgements

We would like to express our gratitude to the management of GarantiTechnology, Mr. Husnu Erel and Mr. Rustu Karaca for allowing us to developthis work. A sincere thank is also expressed to Mr. Arif Isfendiyaroglu of theGaranti Bank and to the CRM team for their continuous support and advice.

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X

Page 15: Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors ... · The divergence of data mining from statistical data analysis is seen in a few fundamental respects [2]. The first difference

Data Mining II

References

[1] Cabena, P., Choi H.H., Kim I.S., Otsuka S., Reinschmidt J. & Saarenvirta G.,Intelligent Miner for Data Applications Guide, IBM Redbooks: SG24-5252-00,1999.[2] Hosking, J.R.M., Pednault, E. P. D. & Sudan, M., A statistical perspective ondata mining. Future Generation Computer Systems, 13, pp.117-134, 1997.[3] Using the IBM Intelligent Miner for Data, Version 2 Release 1.[4] Pyle, D., Data Preparation for Data Mining, Morgan Kaufmann Publishers,Inc.: San Fransisco, California, 1999.[5] Berry, M.J.A. & Linoff G.S., Mastering Data Mining: The Art and Science ofCustomer Relationship Management, John Wiley & Sons, Inc., 2000.

Data Mining II, C.A. Brebbia & N.F.F. Ebecken (Editors) © 2000 WIT Press, www.witpress.com, ISBN 1-85312-821-X