the enhancemeat of fraud detectioa systems using machine ...€¦ · the enhancement of credit card...

The Enhancemeat of Credit Card Fraud Detectioa Systems using Machine Learning Methodology

BY

Soheila Ehramikar

Center for Management of Technology and Entrepreneurship Faculty of Applied Science and Engineering

University of Toronto 4 Taddle Creek Road

Toronto ON MSS 1 A4

A thesis submitted in conformity with the requirements for the degree of Master of Applied Science

Department of Chernical Engineering and Applied Chemistry

Q Copyeht by Soheila Ehramikar. 2000

A c q u ~ s and Acquisitions et 6iôliographic Services seNices bibliographiques

The author has granted a non- L'auteur a accorde une licence non exclusive Licence allowing the exclusive pennettant à la National Lib~ary of Canada to Bibliothèque nationale du Canada de reproduce, loan, distribute or seU reproduire, prêter, disûiiuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la fome de mierofiche/nlm, de

reproduction sur papier ou sur fomiat électronique.

The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantiai extracts fiom it Ni la these ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.

cana!!

The Enhancement of Credit Card Fraud Detection Systems using Machine Leaming Methodology

Master of Applied Science, 2000

Soheila Ehramikar

Department of Chernical Engineering and Applied Chernistry

University of Toronto

In Canada, credit card h u d occurrences rose sharply in 1998 causing $147 million in losses. To

address this problem, financial institutions (Fls) are tmploying preventive measures and h u d

detection systems one of which is called FDS. Although FDS has show good results in reducing

fraud, the majority of casa being fiagged by this system are Folse Positives resulting in substantial

investigation costs and cardholder inconvenience.

The possibilities of enhancing the cumnt operation by introducing a post processing systnn

constitute the objective of this resemh. The data used for the analysis was provided by one of the

major Canadian banks. Based on variations and combinations of tèatures and training class

distributions. different sets of experiments were performed to explore the influence of these

parameters on the performance of the prototype developed. The results indicate that the employed

approach has a very good potential to improve on the existing system. However, M e r research is

required including the development of prototype systems which should be enhanced by more

extensive and informative data.

- - -. The Enhancement of Credit Card Fraud Detection Systems tr

Acknowledgements

I would like to express my deepest gratitude, above ail else, to God for al1 His mercy, grace,

and support through my entire life. With His innnite help and grace everything is possible.

1 am sbcerely grateful to Professor Joseph C. Paradi for his supe~sion, tirne, support,

motivation, and teachings. 1 would iike to M e r thank him for providing me with al1 the

opportunities that have helped me to leam so much. 1 would also WIe to thank Dr. B.

McCabe for h a time and comments on my thesis.

Specid thanks go to Debbie, Bessie, Elford, Thomas, Lony, Herman, Dan, and Bob âom

the collaborating bank. They were al1 helphrl and informative. I greatiy appreciate their

time, help, and understanding.

Sincere thanks to dl the members of the CMTE and special thanks to Taraneh, Behnak,

Rinku, Heather, Ramez, Claudio, Karima, and Oscar for their supportive friendship,

encouragement, and moral support throughout the period of my study at the University of

Toronto.

Ine Enhancement of Credit Cmd Fraud Deteetion Systmis ui

Tu the niemory of a great sage andfiend for his invaluable teaclt ings

ne Enhancement of Credit Cmd Fraud Derection Systems iv

Table of Contents

Abstract ii

Acknowledgements iü

List of Tables ix -

List of Figures xi

CHAPTER 1 .O Introduction -- -

1.1 Background 1.2 RoblemDefinition 1 -3 Research Motivation 1.4 Outline

CHAPTER 2.0 Literature Review 2.1 Credit Cards 7

2.1.1 History 2.1.2 Convenicnt method of payment

2.2 Credit Card Transaction Rocess 2.2.1 Parties involvecl in a transaction 2.2.2 Overview of transaction processing flow

2.2.2.1 The card 22.2.2 The swipe machine 2.2.2.3 The Tandem 2.2.2.4 The mahhme

2.3 Credit Card Fraud 19 2.3.1 Generai statistics 20 2-32 Fraud schemes

2.3.2.1 Lost and stolen 20 23.22 Never received isswd 0 21

Ine Enhancement of Credt Cmd Fraud Detection Systems v

Table of Contents

2.3.2.3 Counter5eit 21 2.3 .2.4 Telemarketing and mail- 22

ordet h u d 2.3.2.5 Fraudulent applications 22

2.3.3 New technologies and card counterfeiting 22 2.3.4 The counterfeiting process 23

2.4 Credit Cards in Canada 2.4.1 Interest rate base 2.4.2 Statistics on credit card h u d

2.5 Summary 31

CHAPTER 3 .O Fraud Solution A pproaches 3.1 The Future of Bank Cards 32

3.1.1 Smart cards 33 3.1.1.1 hplernentation issues in 33

North Arnerica 3.2 Fraud Detection Systems

3.2.1 Rule-based systems 3.2.2 In house detection s 3.2.3 Neural networks

3.2.3.1

3.3 Fraud Investigation Process 3.4 Fraud Detection Dilemma 3.5 Summary

;ohare 35 35

The advantages of neural 36 networks The disadvantages of neural 36 neîworks Neurai networks and Fis 37 FDS and credit card h u d 37

4. i Classification 4.1.1 Reasons for classification

4.2 Overview of Leaming Systerns 43 4.21 The classification mode1 44 4.23 Hypothesis space in supavised leaming 45

4.3 Perspectives on c~assifïcation 48

lne Enhancement of Credit C d Frmd Detemion Systems vi

Table of Contents

4.3.1 Statistical approaches 4+3.2 Neural networks 4.3.3 Mac hine leaming

4.4 Leaming Decision Trees 4.4.1 Domain application of decision tree learning 4.4.2 Overview of decision tree leaming method 4.4.3 An illustration of decision trees induction

4.4.3.1 hduction of decision trees £iom examples

4.4.4 Boosting 4.4.5 Cross-validation

- - -- -

5.1 Data Requirements 65 5.2 Data Collection

5.2.1 Labeling the transactions 5.2.2 Preprocessing the databases

5.3 Leaming Requirements 5.3.1 Feanues and classes

5.4 Concept Leaming and Search Space 72 5.4.1 Selected software 73

5.4. 1 -1 Trees into rules 73

5.5 See5

5.6 Design

76 5 .S. 1 See5 construction options 76

5.5.1.1 Decision trees 76 5.5.1.2 Rulesets 78 5.5.1.3 Boosting 80 5.5.1.4 Cross validation trails 82

of Experiments 84 5.6.1 Data set design 84 5.6.2 The notion of class dimiution design 84 5.6.3 Distribution design for training and testing sets 85 5.6.4 Experiments 86

5.7 Interpretation of TNRP/FN/TP/Emor Rates 87

Tke Enhancement of C d t Cmd Fra4 Detection Systems VÜ

Table of Contents

CHAPTER 6.0 Results and Discussions 6.1 Structuring the Results 90

6.2 Performance Analy sis 6.2.1 Cl assifier performance

6.3 Rediction of New Cases 6.3.1 Prediction of boosted decision trees 101

6.3.1 -1 Class distribution o f 2S:Z 105 6.3.1.2 Class distribution o f 5O:SO 105

6.4 Concluding Remarks

CHAPTER 7.0 Conclusions and Recommendations 7. 1 conclusions 108

7.2 Recommendations for Future Research 11 1

Glossary 113

References 121

Appendiw A A sample of datasets 127

Appendix B Output Summary of See5 134 -- - - - -- - - - -

Appendix C Sumrnary of Results ISO

Appendk D Classifier Prediction 157

D. 1 Making Prediction using boosted decision trees (25:75) 158 D.2 Making Prediction using boosted decision trees (5050) 162

List of Tables

Table 2-1

Tab te 2-2

Table 4-1

Table 5-1

Table 5-2

Table 513

Table 5-4

Table 6 1

Table 6 2

Table 6-3

Table 6-4

Table 6-5

Table 66

Table 6 7

Table 6 8

Table 6 9

Table 6 1 0

Table 611

Table 6 1 2

Table 613

Table 614

Table A-1

Table A-2

Geaeral statistics on credit cards in Canada

Canadian interest rates and annual fees

A mal1 training set for the restaurant domain

Credit card observations

Design distribution for training and testing sets

Confiision matrix for two classes

Two class classification performance

BDT evaluation on üaining data (Al1 features are considered)

BDT evaluation on training data (Card type is disregarded)

BDT evaluation on training data (POS & card type are disregarded)

BDT evaiuation on testing data (Al1 Features are considered)

BDT evaluation on testing data (POS & card type are disregarded)

BDT evaluation on testing data (POS & card type are dimgarded)

DT evaiuation on training data (Al1 features are considered)

DT evaluation on training data (Carci type is disregarded)

DT evaluation on training data (PûS & card type are disregarded)

DT evaiuation on testing data (AU features are considered)

DT evaluation on testing data (Card type is disregarded)

DT evaluation on testing data (POS & carci type are disregarded)

The summary results of prediction on new cases by BDT

The s ~ ~ n m a r y results ofprediction on new cases by BDT

A small sample h m Iegitimate files

A small sampIe h m hudulent file

lne Enhancement of Credit C d F r d Detection Systeins ix

List of Tables

Table C-1

Table C-2

Table C-3

Table C-4

Table C-5

Table Cd

Table C-7

Table C-8

Table C-9

Table D-1

Table D-2

Table Il-3

Table û-4

The summary nsults of classifiers (Ali features are considered)

The summary results of generated classifiers

(Cd type is disregarded)


(POS & card type are disregarded)


(Al1 features are considered)

The summary results of genemted classifiers

(Card type is disregarded)


(POS & card type are dimgarded)

The sumrnary results of generated classifies

(Al1 features are considered)


(Card type is disregarded)


(POS & card type are àisregarded)

Prediction of new cases using BDT classifier (Legitimate accounts)

Prediction of new cases using BDT classifier (Fraudulent accounts)

Prediction of new cases ushg BDT classifier (Legitimate accounts)

Rediction of new cases uskg BDT classifier(Frauduient accounts)

me Enhancement of Credit Cord F r d Detecttion Systems x

List of Figures

Figure 2-1

Figure 2-2

Figure 2-3

Figure 2-4

Figure 2-5

Figure 4-1

Figure 4-2

Figure 4-3

Figure 44

Figure 4-5

Figure 4-6

Figure 5-1

Figure 5-2

Figure 5-3

Figure 5 4

Fmre 5-5

Figure 6-1

Figure 6 2

Fibre 6-3

Figure 64

OveMew of Visa processing transaction

Fictitious transaction using counterfeit car&

Credit cards in circulation in Canada

Cards fhuduiently used in Canada

Statistics on different types of fraud in Canada

An illustration of several hypothesis in leaming algorithm

An illustration of a decision tree

Partitionhg the examples by testing on attributes

Decision me leaming aigorithm partitions the universe

The decision tree learning aigorithm

The induced tree fiom the 12 training exarnples

Decision tree representation for (A=l and B=1) or (C=l and D=l)

Output summary of SeeS (Decision tree option)

Output summary of SeeS (Rulesets)

Output summary of S e d (Boosting)

Output summary of See5 (Cross validation)

Evaluation of BDT and DT classifiers on the training data

Evaluation of BDT and DT classifiers on the testing data

Prediction of BDT cIassifier trained on 5050 distribution

on a lepitirnate case

Prediction of BDT classifier trained on 5050 distniution on a fiaudulent case

me Enhancement of Credit Cwd Fmud Detection Systems xi

CHAPTER 1 Introduction

1.1 Backaround

nie Erst use of today's style of medit cards was started at the end of World War 1 in the

United States (US.). in 1956, BankAmericard (now VISA) entered the market followed

by Master Charge (now MasterCard). In 1968, four Canadian banks intmiuced Chargex

(now VISA) credit cards to meet the market demand IMACD83.

Credit cards are one of the most popular methods of payment worldwide and particularly in

North Amerka, due to the existence of a widespread point of sale VOS) network. Miilions

of people amund the world use credit cards to purchase g d s and services by having access

to cndit for a perioci of several weelrs. Any convenient system could be abused and crcdit

car& are no exception to this de. Along with the nse of credit card use, h u d is on the

rise. Financiai Institutions (FIS) M e r sophisticated fraudulent activities and bear milüons

of dollars losses each year. Based on statistics (SEND9q h u d represents more than $1

billion aiuiually for Visa and MasterCard worldwide.

me Enhancement of Credit Cmd Frauti Detection Systems 1


Crcdit card issuers and their member banks ûy to fïnd new ways to prevetit hud. Some of

the preventive measures on the cards are magnetic stripes, kee-dimensional holograms, ami

card validation codes (CVC). These institutions are also looking at the replacement of

credit cards with Smart C d , however, based on estimates this replacement will be very

expensive due to the widespread POS network in North Arnerica and the huge number of

credit cards in circulation Ui this part of the world. FIS also make extensive use of a variety

of technologies, mainly Neural Networks (NNs), to track and identify suspicious

haasactions and flag them for m e r investigation.

1.2 Problem Definition

In the early days of credit cards, fighting h u d was rather simple. Every week a bulletin

called a 'hot list', was passed out to the merchants. This bulletin enlisted the numbers of

lost or stolen cards so that the merchants were able to check the customer credit card number

against these numbers. With the expansion of credit card market, criminals have devised

numemus ways to get around advanced security measures, such as magnetic stripes and

holograms. In fact, cardholders are the real victims of tkaud occurrences, as they wiii pay

for the coat of h u d losses borne by either card issuers or merchants. To compensate for

these losses card issuers raise the interest rates or annual fees and merchants raise the pnce

of their merchandise.

Despite the best efforts of the FIS, law domement agencies, and the govemment, card

h u d continues to rise. In addition to significant financial losses, the main concem of the

Ine Enhancement of Credit C d Fraud Detection Systems

CHAPTER 1 Introduction -- - -

law enforcernent agencies is that this money is also used to support other criminal activities

worldwide.

In the past few years, a neural network based software we cal1 FDS (not to mention the mai

name) has been adapted by a large number of FIS for h u d detection. FDS scores the

transactions for the iikeiihood of fiaud in real the. When these scores hit a thresbold set

by the FIS, a case is created and those accounts are passed to the fniud analysts for M e r

follow up. Fraud analysts are security officers trained to examine the cardholder's

historical behavior and by considering different factors detemine the potential risk

associated with the flagged accounts.

Fraud investigation is a difficult task and FIS are reluctant to block an account without

making sure that the transaction is, indeeâ, fiaudulent. Very often an 'unumial' transaction

is legitimate and issuers are anxious not to inadvertently offend a cardholder by acting too

hastily and blocking herhs account, especiaily in cases where the fraud officer is unable to

find the cardholders and venfy the transactions with them.

FDS bas shown good results in detecting hudulent transactions, however, the rnajority of

transactions (appmximately 9W) being flagged by this system as potentidy hudulent, are

in fact legitimate. It should be noteâ that although h u d analysts, based on their experience

and evaluation of the customer's history, might corne to the conclusion that the activity of

the flagged account is legitimate, bank policy requires them to cal1 every individual

cardholder for the verification of transactions WEAN99). The process of caiîing

cardholders results in three major problems:

The Enhancement of Credit Card F r d Detecriion Systems 3

CWPTER 1 Introduction

1. Not al1 the suspicious transactions are necessarily fkaudulent. This type of error is

referred to as fahe positive (FP) which means that the case was not h u d although it

was flagged as being potentially fkaudulent. The process of confirming every

transaction that deviated fiom the cardholder's usual behavior results in potential

customer dissatisfaction.

2. The costs associated with investigating a large number of false positives are very high.

3. Cmently, a substantial amount of time is being spent on investigating a large number

of legitimate cases (FPs). If the number of investigation on FPs could be lowered

down, fiaud analysts can spent more time on real fraud cases, preventuig more losses to

the industry.

1.3 Research Motivation

As pointed out earlier, the verification of suspicious transactions with the cardholder is a

major part of h u d investigation and caanot be eliminated. Therefore, any solution that

refïnes the investigation selection process by reâucing the number of unnecessary c d s is

welcomed by the FIS.

Collaboration with one of the major Canadian banks was estabüshed to examine the

potential ways of enhancing the cunent system. Based on information obtained h m this

bank, for the cumnt thnshold, FDS flags close to 50,000 accounts per month ail acmss

Canada, The main objective of this research is to improve the process of personal foiiow

up on a large nimiber of suspicious transactions. In other words, to find a way to

Ine Enhancement of Credit Card Fmrd Deiecîion Systienrp 4


pnprocess the flagged transactions and identify the most probable legitimate transactions

h m the stream of legitimate/fraudulent transactions. If this goal could be achieved, the

volume of unnecessary investigations is reducd leading to significant savings for the FI.

Moreover, the current FDS threshold cm also be lowered and a number of fiaudulent cases,

being missed under this level, can be detected As a result, the fraud is discovered earlier

and the overall losses may be reduced.

For addressing these challenges, decision tree leaning methodology, one of the most

commody used applications of Artificial Intelligence (AI) for addressing the pattern

recognition and classification problems was considered.

1.4 Outline

Chapter 2 gives an o v e ~ e w of the history of medit cards, the transaction and authorization

processing operation of the FIS and the ways that this convenient method of payment has

been endangered by crllninals. ït proceeds to explore the types of €iaud and concludes with

statistics and some facts regardhg credit card h u d occurrences in Canada

Chapter 3 introduces the existing h u d solution approaches and gives a bnef introduction

to the existing Fraud Detection Systems (FDS). It touches on neural network tecbnology,

its advantages and disadvautages and briefly describes its application to c d t card hud

detection. It fkther elaborates on h u d investigation prucesses and the associateci issues.

Ine E~hmcement NCredit Càrd Fraud Deteetion Systems 5


Chapter 4 introduces the notion of classification and the main strands of research in this

area. It gives an o v e ~ e w of leaming systems, theù requirements, and describes leaming

decision trees and theù applications.

Chapter 5 provides a detailed explanation on how data was acquired and the steps requind

for processing the data to make it ready for the anaiysis. it htroduces the leaming software

and touches on its capabilities. It m e r elaborates on the data set and class distribution

designs for training and testing sets, and the variations of experiments perfoxmed.

Chapter 6 presents the results of the experiments and discusses the effects of the riauiing

class distributions on the results. It proceeds to compare the performance of different

classifiers with each other, and presents the efficient classifier based on the criteria defineci.

It also examines the evaluation prediction of two classifiers on the prediction of new cases.

Chapter 7 presents the conclusioas of this mearch and offers suggestions for fùrther study.

The Enhancement of Credit Card Fraud Detection Systemr 6

CHAPTER 2 Literature Review

2.1 Credit Cards

2.1 .1 History

It is believed that credit is as old as trade. An Assyrian tablet which dates to 2000 B.C.,

has been discovered that depicts a credit transaction. However, the closest approximation

to the modem credit card was pmbably the signet ring carried by Teutoaic knights in the

Middle Ages. Each ring, bearing the knight's coat of arms, was registered by the artisan

who had engraveci it. The lists of the registered rings were then cuculated to business

establishments located around the castle. When the knight purchased something, hc sigaed

the bill with his signet Nig pressed into hot wax. Artisans and tradesmen would submit

their biiis to the castie for iater settiements wCD8S1 .

in the United States, oil companies and hotel chains started to introduce limited use of have1

cards at the end of the Worid War 1 to favoured customers. These car& introduced the

pason as a gooâ custor.net of the issuing fhm and stated that the customer could pay for the

ïïie Enhancement of Credit Card Fmud Detection Systems 7

CHAPTER 2 titerature Review

purchased goods and services after they r e m home. In 1950, an American entrepreneur

named Frank X. MacNamara introduced the idea of credit card use in more places than just

local restaurants. From this idea the Diners Club card for travel and entertainment emerged.

Soon after Carte Blanche and Arneaican Express appeared in the market. In 1951, Frankün

National Bank of New York was the h t hancial institution (FI) to enter the credit card

market by issuhg its bank card. Within four years about 100 other £bancial institutions

(FIS) introduced their own cards. In 1956, the BankArnencard (now VISA) entereâ the

market foilowed by Master Charge (ww MasterCard). In 1968, four Canadian banks

boduced Chargex (now VISA) credit cards to the market and in the b t day of their use

two cases of fkaud was committed [MACDWl.

2.1.2 Convenient Method of Payment

A credit card is a special product with the following characteristics ICBA99aI:

It provides millioas of people around the world with the opportunity to purchase

goods and seMces with access to cndit for up to 51 days, depending on the posted

date of the purchase, at no cost provided that the amount owing is paid back by the

statement due date.

r Cardholders do not have to put up collateral against the amount they spend,

therefore, it is unsecured.

In North America, credit cards are widely used in purchashg go& and services. The main

reawns for this popularity are:

me Enhancernent of Credit Card Fmud Detection Systems 8


The existence of a widespread point of sale (POS) aetwork.

Reducing the risk of carrying cash and the advantage of several weeks of fke

credit plus optional services and benefits such as Air Miles, free insurance plans,

and a nurnber of other rewaids.

Security of b d s , that is, in case of card loss or thefi, the cardholder's liability at

the most is $50 provided that the cardholder reports a lost or stolen card in a

timely manner.

The credit card system facilitates commercial transactions and provides profits for the

participating parties. The source of income for card issuers (Ch) may corne hm: (1)

merchant user fees, (2) cardholder user fees, and (3) interest charged on unpaid balances.

In purchashg goods and services the buyer pays for a purchase by using a line of credit h m

the credit card issuer (CI). The CI pays the seller for the purchase, and the buyer then pays

the balance on the credit card back to the CI. Since the claim presented in payrnent is

considercd a liability of the credit card issuer, this type of transaction tramfers much of the

risk of i d c i e n t h d s in the transaction h m the seller to the credit card issuer. in order

to make up for these losses, CIs determine annual fees and interest rates based on the

unrecoverable amount of money incurred by these losses. It is worthwhile to point out that

most of the CIs are FIS even though there are many which are not. In this work, we use FI

owned credit card operation as our "labctratory".

me Enhmicement of Credit Cmd FrcndDeteclion Systems 9

CHAPTER 2 Literakire Review

2.2 Credit Cards Transaction Process

The following information on the creâit card transaction processes was collected through

personal meetings with the staff of the sponsoring hancial institution [CLAR981 and

review of a Master thesis [SEETMI on this subject.

2.2.1 Parties lnvolved in a Transaction

Four parties are involved in processing a credit card transaction: (1) the cardholder, (2) the

merchant, (3) the Financial Institution 0. and (4) the VISA center.

The cardholder uses the c d for a purchase and provided that the statement

amount is paid back by the due date, interest charges will not occur.

The merchant by accepting the card for payment. has the advantage of sec- of

payment by the FIS.

The FIS issue the cards, settie other FI'S cardholder and merchant transactions

with VISA, pmcess the incoming transactions and provide the cardholders with

monthly statements.

VISA standarâizes the transaction process and settles the interaction between

cardbolders, merchants, and the FIS. It also keeps track of transactions, and

markets the card.

for every transaction, one or hKO FIS are involved: the cardholder's and the

merchant's. The cadholders of one FI might go to the merchants of the same FI or another

FI. Therefore, dependhg on the situation, the FI could assume two d e s , being an agent fat

ï%e Enhancement of Credit Cmd Fraud Detection Systems 10


both the cardholder and the merchant or being an agent for either one of them. Hence, the

transaction processing system must be able to separate the incoming transactions of a

particulat FI fiom the other FIS and route each transaction to the appropnate place for

authorization and record keeping.

2.2.2 Overview of Transaction Processing Flow

In purchashg goods or services through credit cards, in on-line processing systems, the

authorization is the essentiai element of the transaction processing system. The

authorization process is the first level of protection against ûaudulent activities and it dso

maintains control over the cardholder's credit b i t . It should be noted that the

authorization is kept as a temporary file for up to five days and when the transaction is

recorded in the cardholder file, the cardholder account balance is updated. In addition to on-

line authorizations, there are merchants who have floor limits. If the amount of the

transaction is below that limit, the authorization does not need to go through the R's system

and the merchant has the right to authorize the transaction locally. In fact, due to a

widespread POS network in North America rnany merchants have 'zero' flwr k t and

ahost every transaction has to be authorized on-line by the related FI.

nie authorization pmcess begins when a cardholder uses h i d k card for a transaction. The

POS machine reads the magnetic stripe embossed on the back of the card which encodes the

card holder's name, account number, credit limit, and the expiry date. The authorization is

completed when the tninsaction is approved and the cardholder signs the transaction slip.

The next step is the submîssion of transaction slips to the FI which either is done

me Enhancement of Credit Cmd Frmd Detechon Systems 11

CHAPTER 2 Lierature Review

electronically or manuaiiy. In electronic transfer, the POS machine keeps track of all the

authorizations and sends them electronically to the FI. In îhis case the merchants do not

need to submit the transaction slips. The manual option is when the merchant sends the

actual slips to the FI and the FI'S operaton will enter the records manually into the system.

in both cases after the submission of transactions, the FI credits the merchant's account by

that amount.

To handle a large nurnber of cardholders' monthly statements, F?s have set up several billing

cycles during the month. A certain number of cardholders are associated with each of

these cycles and the date for each billing cycle is different nom the other cycles. At the

end of each cycle a new statement is pmcessed and mailed to the cardholder of that cycle

dong with a due date for payment. The statement contains information on the transactions

such as the date, the refereace number, the description, and the amount. Other information

such as the previous and the new balance, the minimum payment, and the available credit is

also included. The cardholder is required to pay the total or part of the balance. I f the

balance is paid back in full there are no interest charges. If the cardholder's payment is

less Uian the minimum amount, the =dit rating of the cardholder could be aected and the

cardholder may be considered delinquent. Another type of transaction possible by credit

cards, is obtaining cash advances. In this type of transaction the interest is charged h m

the day when the money is withdrawn even though the balance is paid back in hill on the

statement due date.

For handling a huge nmbet of daily transactions, FIS and VISA have implemented a r d -

time, non-stop system of cornputer hardware and sotbuare. This system includes the

l%e Enhuncenient of Credit Cmd Fmud Detection Systems 12

communications network among the FIS and the VISA network as well as handles the data

processing and the record keeping tasks. Figure 2-1 depicts an overview of the VISA

transaction processing system. The main components of the system are descnied below.

2.2.2.1 The card

A credit c d is a standard plastic card with a magnetic stripe on its back which is read by a

POS machine at the point of purchase. The fiont side of the card has the cardholder's

name, account number, the expiry date and a hologram. Currently, there are a variety of

cards in the market. in general, the main categories of cards are: (1) classic, (2) gold, and

(3) platinum. Classic car& do not nomally have annual fees and are not associated with

rewards programs. Very ofien the credit k t on gold and platinum cards is much higher

than the classic types but these cards have annual fees as well. Services and rewards, such

as uisurance coverage for car rental, are mostly associated with these types of cards.

2.2.2.2 The Swipe Machine

POS temiinals or swipe machines are very common in North America and are used for on-

line authorization. m e r swiping the card through the machine and entering the amount of

purchase on the keypad, the POS temillial reads the c d ' s magnetic stripe information and

places a cal1 to the merchant FI'S cornputer. This information, along with the merchant

number, is transmitted via a modem, to the on-line authorkation system. This service is

typically processed by non-stop Tandem cornputers.

Tiie Enhancement of Credit Cwd Fraud Detection Systemr 13

Figure 2-1 ûverview of VISA ptocessing transaction system

Tandem Functions: 1. Route the transaction to

proper place International

High score 1 2. Not authorizcd (c.g., Blockcd, ovcr Limit, etc.) Fraud

investigation

me Enhancement of Credit Card Frmrd Detection Systens 14

CHAPTER 2 Litemture Review

2.2.2.3 The Tandem

The Tandem is a non-stop computer used to process al1 the hcoming electronic transactions

regardless of the merchant's institution and country. Every FI has its own Tandem which

is comected to the VISA network. The fiinctionality of the Tandem is summarized below.

1. Keeping a record of al1 incoming transactions for M e r nfernl in case of any system

malfûnction. The incoming transactions are categorized as follows:

0 A transaction by the merchant and the cardholder from the same FI.

0 A transaction by the FI'S merchmt with a cardholder Eorn another FIS. These

transactions will be routed to VISA network and h m there they will be sent to

the cardholder FI's Tandem.

0 A transaction by the FI'S cardholder with another FI merchant. niese

transactions are sent to the VISA network and fiom there they are routed to the

cardholder's FI Tandem for authorization.

2. There are occasions when the FI's maidiame is not able to do the authorization

pmcessing due to: (1) a system breakdown, (2) when the FI'S computer system is down

for different reasons (e.g., maintenance). in these cucumstances, the Tandem does

'stand-in' authorization processing, that is, it authorizes a transaction on behalf of the

mainframe. This process is desmi below.

The Tandem authonzes the transactions basai on a 'negative file' and an assigned fïoor

limit. Negative nle includes aii the card numbers that have been considerd budulait

intemationaiiy. This list is provided by Visa intemationai and is updated quite fÏequently

n e Enhancement of Credit C d Fraud Detection S'stems 15

CHAPTER 2 Literahrte Review

with the occurrence of new h u d cases. Before any authorization, the Tandem checks that

the card number is not on that list. The Tandem does not have cardholder's account

information and, therefore, it cannot do any credit limit checkhg for the cardholder's

account but there is a set credit iimit for the incornhg transactions that the Tandem will

check and will not exceed. When the cardholder FI's mainframe becomes available again,

the Tandem wili send the approved authorized transactions to the mainframe either in real

time or as a batch file, dependhg on the circumstances.

2.2.2.4 The Mainframe

The Tandem sorts the incoming transactions and transmits only those transactions which are

fiom the FI'S cardholders or merchants to the FI's maioframe cornputer for authorization.

The maidkame, as the main component of the system, processes al1 the incoming

authorization requests. For authorization the mainframe performs a series of checks to

ensure that the customer is eligible for making purchases. Some of these checks are listed

here (CLAR971:

C d & p i ~ Darc: If the card is expireci the authorization is not dlowed and the transaction

is declined.

Ercessive Aufhori 'ons: Under the normal situations a client will not exceed a certain

number of transactions in a 24 hour period. This check limits the number of authorizatiom

that a customer cm do during that p e r i d If an account goes over the ailowed number for

the day, the authorization wiU either be deciincd 0) or r e f d (R) (i.e., referring the case

to the FI staff)

me Enhancement of Credit Card Fraud Deîection Systents 16


Blocked: Block codes are used to put conditions on accounts. An account is checked for

being hudulent, delinquent or blocked. Lf any of these checks are positive, the

authorization will either be dedineci or referred and the transaction is refiised.

C'redit Li& Check: This check verifies that the cardholder has not exceeded hisher credit

Limit. If the sum of the current transaction amount and the current balance is under the

credit limit, the authorization will be approved, otherwise it will be declined.

in the checking process, if one of the required checks for the ûansaction fails, the

authorization is declined and this refiisal will be sent to the Tandem and h m there it will be

sent back to the merchant. When the transaction passes al1 the nquired checks, the

approved authorization goes back to the Tandem and fiom there, is sent back to the

merchant. To make al1 these activities happen, FIS have implernented several sophisticated

software packages. The databases required to track the aforementioned activities are as

follows:

Merchurit F i k Information on the ET'S merchant are included in this database.

Cudholder FUe: Information on the FI'S cardholder account such as name, a d h s ,

account number, cumnt balance, credit limit, expiry date, and so on are containeci in

this file.

Autho-OII log: AU the authorized transactions done by the FI for its own

cardholders are included in this file.

ne Enhancement @redit Cmd Frmd Detection Systems 17

4. Pmted IXfle: This file keeps a record of all the transactions that have been

received h m the Tandem but have not yet been posted to the cardholdea' monthly

staternent.

5. Stutement FUe: At the end of the cardholder's cycle, the accumulated transactions in

the posted TX file will be sent to this file and the monthly statement for the

cardholder is printed out of this file.

When an authorization is approved the account's available credit and amouni/number of

authorizations are updated and the mainframe sends this Somation to the authorization log

database and the posted TX file. At the end of each cycle date, al1 the posted transactions of

each account fiom the posted TX file will be sent to the statement file processor. This is

used in printing out the monthly statements of the cardholdea. When the cardholder pays

the total or the minimum due amount, the cardholder's file is updatd by this payment and

the current balance is adjusted. To Save computer disk space, the statement file keeps a

record of the last thne statements and by the production of a new statement the oldest

statement is archived.

The Enhancement of Credit Cmd Fra& Detecrion Systems 18

2.3 Credit Card Fraud

Plastic card based payment systems are booming and being used more extensively by

organizations and individuaîs. Obviously industries with this Pace of growth are

vulnerable to attacks by fiaudsters. In one survey [SMIT97 conducted in the United

States (U.S.) in 1993, a group of 14 credit card hudsters admitteci to ernploying over LOO

different ways of ushg credit cards to obtain Funds illegally.

2.3.1 General Statistics

Bank card ûaud losses to Visa and MasterCard alone have increased fiom $1 10 million in

1980 to an estirnated amount of $1.63 billion in 1995 woridwide. The U.S. has suffered the

bulk of these losses - approximately $875 million for 1995 alone. This is not surprising

because 7 1 percent of al1 worldwide revolving credit cards in circulation are issued in that

country. Ln 1994, approximately 124 million of the 193 million aduh in the U.S. had at

least one credit card [SLOTgV. While precise figures are not available for the credit card

industry as a whoie, based on credit card use of $879 billion for 1995, the estimates imply a

h u d rate of between 0.1 to 0.2 percent. In the case of bank cards (MasterCard and Visa),

a study done by the American Bankers Association in 1996 estimated total gross fraud

losses for 1995 at $812 million versus purchases of $451 billion, implying a loss rate of 0.18

percent mOBE981. In 1997, credit card h u d losses for Visa, MasterCani, Amerkm

Express and Discover was estimated to be around $2 billion whereas this amount in 1990

was $440 million mF981.

The E~hmcement of Credit Cmd Frmd Detectim Systlems 19

2.3.2 Fraud Schemes

Unauthorized use of credit cards for acquiring goods or services is h u d . Visa and

MasterCard constitute about 65 percent of al1 outstanding revolving credit worldwide and

the substantial number of h u d occurrences is centereâ on one or both of these car&

(SLOT97. Most credit card h u d schemes fa11 kto the following categories [CBA99b]:

1. Lost / Stolen

2. Never Received Issued (NRI) (Mail thefi)

3. Counterfeit

4. Telemarketing and rnail-order

S. Fraudulent applications

2.3.2.1 Lost and Stolen

Lost and stolen cards account for the majonty of h u d cases. Fifty five percent of Visa

and forty nine percent of MasterCard losses are based on lost/stolen cwds. The average

loss incumd by this kind of h u d is $700 [ANON98a]. When a card is lost or stolen the

opporhuiity for fraud starts. Woricplaces, glove compartments of cm, and sporting

facilities are the main sources of stolen cards. Very oAen these losses are caused by a

relative or fiend's unauthorized use of the card without the cardholder's knowledge

[CBA99bl. Sometimes cardholders might seli their card to criminab, then report the c d

as lost or stolen or they might do shopping and then repudiate the event and report the c d

as lost or stolen,

2.3.2.2 Never Received lssued (NRI)

An average of 439,Oûû new, renewal and replacement cards are mailed every day. Never-

received cards an car& being stolen h m the mail, either intemally or extemally, wbile in

transit h m the card issuer to the legitimate customer. The card may be used and then be

sold on the black market [ANON98a). The average losses for this type of h u d are

significant because the cardholder is not aware of the theft and by the t h e the fiaud is

detecteâ, a substantial amount of purchases has been made. Very often, oniy when the

cardholder receives herhis monthly statement, does dhe realize that the card has been stolen

in the mail.

Visa's never-received card losses leapt 68 percent in 1997. The average losses h m a

never-received cards are about $1,500, double that of a lost or sîolen card [ANON98al.

One of the new ways to prevent this type of h u d is to send the card to the cardholder as a

worthless piece of plastic (electronically blocked). On the receipt of the card, the

customers have to cd1 the bank to activate their card.

2.3.2.3 Counterfeit

The fastest growing type of bank card fraud is the illegal counterfeiting of credit cards,

mainly Visa and MasterCard By ernploying new technologies criminals are able to

produce exact replicas of existing cards. The average reporteci losses, due to this type of

fiauci, are higher than any other fkaud category estimated at about $4.500 [ANON98a].

me Enhmrcement of Credit Cord Fraud Detection Systems 21


2.3.2.4 Telemarketing and Mail-order Fraud

There are occasions when a hudulent merchant or telemarketer cails to sel1 a non-existent

product over the phone and by acquiring the cardholder's card number, processes a

fiaudulent charge against the accouat. It should be noted that this type of hud , due to

customer awareness, is on the decline.

2.3.2.5 Fraudulent Applications

in this kind of fiaud, fraudsters provide FIS with false idonnation and identities to acquire a

credit card illegally. Unlike stolen cards these car& are not signed and it takes longer tirne

before the h u d is detected. This kind of fkaud, due to the awareness of the Fis, is on the

decline.

2.3.3 New Technologies and Card Counterfeiting

Card counterfeiting, in ternis of fiequency and severity, is on the nse. The basic principle

underlying this kind of hud, is an account number which could be obtained nom diffemet

sources such as legitimate records made in hotels, restaurants, retailers, discarded drafts or

cornputer software.

In order to issue credit cards, financial institutions genmte a series of nwnbers. From

these numbers, a certain nimiber (e.g., 500 of them) may be selected by a process known as

skipping and king used for issuing credit cards. To defhud the Rs, fraudsters may use an

account generating software such as Cnditmaster and Credit Wizard to detennine the

skipping code and reveal the valid credit card account numbers. Frauàsters may also use

The Enhancenient of Credit Card Fmd Detection Systems 22

another type of software called Sniffers to 6ind credit card nurnbers that individuals are

s e n h g online. This software searches the aetworks for 16 digit numbers, records them and

sen& them to the hudster [ANON98b].

Otie of the latest methods of counterfeiting credit cards is 'skimrning' or 'bit copying'. This

is a process by which the magnetic stripe encoding from one card is copied to the shipe of

another card. This method is one of the common methods of counterfeiting credit cards and

is drastically on the nse in Canada. Public places such as particular restaurants and gas

stations are major sources of these fiaudulent activities [DEAN99].

The acquired number will then be embossed or encoded on a piece of plastic designed for

this purpose. Whenever the number is embossed on an ordinary (blank) plastic card, it is

called a 'white plastic' hud. When the number is embossed and/or encoded on an expired

or stolen credit card (hm which the original data have been removed) the result is an

'altered' or 'falsifie& creàit card. In the case of card alteration, magnetic stripes are

altered or manufactured using equipment that can be purchased at electronic stores. Whcn

the number is affixed ont0 a totally cowiterfeit credit card, it is called a 'pure countcrfeit'

card [MAlWn. Figure 2-2 depicts the card counterfeiting process schematically.

2.3.4 The Counterfeiting Process

To understand the complexity and the nature of c d counterfeiting, it is important to

introâuce the methodology used by counterfeiters in their operations. With improvements

in technology, counterfeiting a credit card is often done by ushg deslrtop cornputer systems

Tlie Enhancement of Credit Cmd F r d Detection Systems 23


Figure 2-2 Fraudulent transactions using counterfeit cards m T I 9 7

Type of counterfeit credit card

Actual th& of credit car&

Feichg Market

Embossing machines Encoding machines

Theft of credit card

\ J

White plastic h u d

I

transaction

counterfeit 1 1 cmciiicad

with peripherals, including embossers, laminators, and tipping foi1 in order to produce a

more realistic lmhg card complete with a hologrimi and Mly mcoded magnetic strip.

O h the examination of the hologram is the key to the identification of a counterfeit c d

On the legitimate cards, the hologmm is embedded in the plastic at the thne of

me Enhancement of Credit Cmd Fraud Detection Systenrr 24

manufacturiag whereas counterfeit credit cards commonly contain a hologram aflked to the

top of the card rather than embedded in the card. Thus, it can be seen or felt to rise slightly

above the card face upon close examination [SLOTgV.

The magnetic stripes and holograms used to counterfeit bank cards have a distinct sub-

market within the criminal communities. Smugglers bring holograms into the US. and

Canada regularly. During Apnl 1994, the Canadian Combined Forces Special Enforcement

Unit arrested members of a group that produced approximately 300,000 counterfeit

holograms of which 250,000 had aiready been distributeci. Based on the reporteci figures

and an estimated loss of $3,000 per card, Visa and MasterCard anticipated losses of $750

million i n c d by this organized activity [SLOT9q.

The card couoterfeiting in Canada is rnainly an orgaaized crime activity. This criminal

activity started in Vancouver where hudstea imported the technology of pure countdeit

credit cards h m abroad and then it spread to East fiontiers, mainly Toronto W'ïï97.

In mid December 1998, police discovered a factory in the Toronto area that could pmduce

cards h m any financial institution including foreign ones. Police amsted a group of

criminals who were charged with the production of counterfeit credit cards and Canadian

cash. The associateci charges for this criminal activity was so high (307 cndit card related

charges) that police annouaced this operation as the largest one, ever happeneci in C d a .

One of the major concems is that this information cm also be sold to overseas gmups who

then can pmduce more counterfeit cards. In addition to the losses imposed on the industry,

the money obtained cm also be used to buy more sophisticated equipmmt in order to

produce more comterfeit cards and to expand crimimû activities wortdwide [LEM.A981.

Ine Enhancement of Credit Card Fraud Detection Systeins 25


2.4 Credit Cards in Canada

There are aver 600 institutions in Canada that issue VISA or MasterCani. Among these

CIs, the number of major institutions that issue VISA or MasterCard are 18; ten banks, one

trust Company, three credit unions, and four other hancial institutions. The other CIs are

aniliated issuers, such as the Bay, G a i d Motors, University of Toronto, Petro-Canada,

Eaton's, Candian Tire, and so on. The number of crdt cards issueci by Financial

institutions (FIS), al1 across the country, is approxirnately 35.3 million [CBA99a). Figure

2-3 and Table 2-1 illustrate general statistics on Visa & MasterCard cards in Canada,

respectively. Figure 2-4 was plotted based on statistics obtained h m Canadian Bankea

Association (CBA) web site.

Ine Enhancement of Credit Cmd Fmd Detection Systents 26

CWTER 2 Literature Reviw

Figure 2-3 Credit cards in circulation in Canada [CBA99cl

1975 1 980 1985 1990 1995 2000

Year

Table 2-1 General sbtistics on credit cards in Canada [CBA99a1

VISA & MASTERCARD

Number of cards in circulation (million) Ouistanding balances (S billion) Retail sales voiume ($ billion) Deünquency ratios (90 days and over) Sales slip processed (million) Average sales (S) Sales and cash advance volume ($ billion) Fraud losses (S million)

October 31,1998

(Fiscal ycar end)

October 31,19W

(Fiscal ycar end)

- -- - - -

fie Enhancement of Credt Cmd F m d Detection Systems 27

CWPTER 2 Citerature Review - - -

2.4.1 lnterest Rate Base

The number of outlets that accept VISA and/or MasterCard in Canada is approximately

620,000. Based on information obtained fiom CBA web site, average credit carà intarst

rates for standard cards, issued by Canada's six largest banks, have dropped by 3.4 percent

since their peak in October 1990. Many banks now issue special low rate car& designed to

benefit cardholders who usually do not pay off their balances every month. Compareà to

other credit cards types, low rate cards have significantly lower interest rates but slightiy

higher annual fees. On average, the interest on low rate card is more than six percent lower

than the standard card rates. A number of factors such as cost of fùnds, losses due to

hud, level of fees and the volume of outstanding balances determine the base for interest

rates. These factors are pointed out below [CBA99a) :

Total tosses due to credit card h u d was estimateci at $147 million in 1998.

0 As banbuptcies have become more acceptable, it has become more fiequent, leadhg

to increased losses in the credit card area.

0 As the cesult of market pressures, the annual fees on standard credit cards have been

eliminat ed . A higher percentage of Canadian cardholders pay off their balances in full or they

have been carrying lower balances resulting in less interest income for the CIs fiom

these sources.

Table 2-2 provides information on the interest rates in Canada.


Table 2-2 Canadian interest rates & annual fees (December 3 1,1998) [CBA99a]

Issuers/Cards Interest Rate Annual Fee

Banks (e.g.. VISA, MasterCard) 8.99% - 18.9% $0 - $39

Retailers, (e.g., Sears, Eaton's) 24% - 28.8% $0

2.4.2 Statistics on Credit Card Fraud

Based on statistics reported by the CBA, credit card h u d occurrences rose sharply in fiscal

1998 (141,274) compared to 1997 (1 13,264). Based on the information obtained Erom this

report, 34 percent of al1 credit card hud occurrences and 50 percent of the $147 million

written off in 1998 was due to counterfeit card hud. This report also indicates that

approximately 50 percent of Canadian credit cards which were used huddently, were used

outside of Canada. Figures 2-3 and 2-4 are plotted based on the statistics obtained h m

the CBA web site [CBA99b]. Figure 2-4 shows the number of cards used hudulently in

Canada Figure 2-5 illustrates the statistics on different types of h u d in Canada

The Enhancement of Credit Card Fraud Deteetion Systems 29


Figure 2 4 Cards rcporteâ hudulently used in Canada (CBA99bl

Year

Tne Enhancement of C'redit Card Fruud Detection Systems 30

C W E R 2 Literature Review

Figure 2-5 Statistics on different types of fi=aud in Canada (CBA99bl

Numkr of Fraud Fiks (Jan 98 - Dac 98) (VISA & MasterCard)

Typa of Fnud

2.5 Summary This Chapter presents an ovewiew of the history of credit cards, the transaction and

authorization processing operation of the FIS and the ways thïs convenîent method of

payment has been endangereâ by criminals. It proceeds to explore the types of hud and

concludes with statistics and some facts regarding credit card h u d occumnces in Canada

Tke Enhancement of Credit Card Fraud Deteetion Systems 31

CHAPTER 3 Fraud Solution Approaches

3.1 The Future of Bank Cards

Fis employ various technologies to detect and prevent credit card hud. Che of these, is

special security numben embedded in the magnetic stripe. The Card Venfication Value

(CVV), Card Venfication Code (CVC), and Card Identification (CID) are the security

numbers being used by VISA, MasterCard, and Arnerican Express, respectively. This

number, dong with the account number and expiration date, foms an aigonthm during the

authorization process. If any part of this aigorithm is missing or incorrect, the authorization

at the point of sale (POS) will be declined (CBA98bl. For this reason, hudsters not only

need to have a valid account number but also need to kaow the mathematical formula used

to mate the code and the methoci of its encryption to be able to prodwe a counterfeit card.

However, there are many situations where preventive techniques (e.g., holograms, validation

codes such as CW) are not effective. For instance, in placing a telephone-order

transaction or using the card over the Intemet, these secuity features cannot be checked.

me Enhancement of Credit Card Fraud Detection Systenrs 32


3.1 .1 Smart Cards

To address the problem, credit card manuf'turers plan to employ a senes of security

features, most of which are designed to enhance customer identification and authorization

requirements. Due to the shortcomings of holograms as a hud pnventive, the next

generation of credit cards, cdled smart cards, has computer chips instead of holograms.

Each card contains a rnicroprocessor memory chip as well as data encoded on the magnetic

sûipe. For an authorization the cardholder is required to enter the personal identification

numbei (PIN) encoded on the microchip. The Uidustry foresees a t h e when bank

customers will be able to use a single card to administer al1 their hancial needs

[ANON94bj. Since the late 1980s, French banks with about 25 million smart cards in

circulation, about half of the world's total of smart cards, have aiready made use of this

technology and based on the reports their h u d volume has been cut drastically (DEME981.

3.1.1.1 lmplementation Issues in North Arnerica

Although the idea of smart cards seems very appealing, getting fiom the idea to practice in

the North American market is another matter. Smart cards have become common in

France and some other European and Asian markets due to the lack of a widespread

communication networks and the relatively costly telephone lines.

Although by shifting to smart carch, the card issuing institutions could Save more than a

billion dollars per year, however, this conversion wouid be very costiy. The main reasons

are (DEME981:

me Enhancement of Credit Curd Fra& Detectiun Systems 33

CHAPTER 3 Fraud Solution Appmches

Thanks to the fairly cheap telecommunication systems in North America, more than 90%

of card transactions are authorized on-line. In the case of this implementation, POS

terminais would neeâ to be replaced or retrofitted for the current card use.

New car& have to be manufactured and distributed. The cost associated with issuing a

smart card is up to six times higher than magnetic stripe cards. The cost is detennined

by the number of chips being mounted on the card.

An agreement, on a new system of fees, has to be established between the card issuers

and theu card organizations.

In the long term, however, smart car& will lead to significant cost savhgs. Although

advancements in sec+ technology are encouraging, smart cards are unlikely to become

widespread until after the year 2000. In 1994, the cost of the inhstructure requkd to

issue smart car& worldwide was estirnateâ at 7.4 billion dollars. Neither Visa nor

MasterCard have yet been able to justifi these costs IANON94bl.

3.2 Fraud Detection Systems

Along with the nse of credit card huâ, FIS are employing various methodologies and

strategies to detect and prevent hud. The main technologies used, are pointed out in the

foiiowing Sections.

3.2.1 Rule-Based Systems

Rule-based systems are cornputer programs, in the category of expert systems, consisting of

a set of 'lf A then B" d e s (where A is an assertion and B can be either an action or another

nie Enhancement of Credit Card F d Detection Systems 34

CHAPTER 3 Fraud Solution Apptoaches

assertion) designeci to monitor transactions and flag unusual behavior such as high valued

purchases or rapidly reaching the cardholder's credit litnit. The result is a List of suspicious

transactions which will be handed in to h u d analysts for investigation.

3.2.2 In House Detection Software

There are occasions where the FIS devise theu own systems based on thcù account histories

and typicd transactions. An example is the system used by American Express. It should

be noted that due to proprietary issues, there is not much Uifomation available on in house

detection system; othenivise, it would be interesthg to compare these systems to Visa

system.

3.2.3 Neural Networks

Neural Networks (NNs) are a subdivision of Amficial Intelligence (AI) designed to address

classification and pattern recognition problems. The terni 'neural' is somewhat misleading.

Although the technology was inspued by the way newons in the brain interact with each

other, in reality there is no thhichg in a neural network. Klimasauskas, Director of

Financial Services at NeuralWare, a Pittsburgh-based neural network vendor, has

commentai on this fact: ' n i e important thing to realize is that neural networlrs, as a

techaology, have nothing to do with the brah. It is called neural because many of the

techniques were nrst introduced by people who were studying the human brain but it is

really a set of mathematical techniques for clustering information and finding c m e s for the

data? PuRa41

The Enhancement of Credït Canl FrdDetection Systems 35


3.2.3.1 The Advantages of Neural Networks

Neural networks are able to capture associations or discover regularities within a set of

variables. The application domain of NNs very much depends on the nahue of the problem

being modeled, but these systems are specincally suitable for domains where the

relationships are dynamic and non-linear. In general, NNs are designed to address the

following situations [PURC94] :

r The number of variables or the volume of data is very diverse.

r The relationship arnong variables is inherently complex and cannot easily be

identified.

r There is a need for modeling diverse behavior by finding patterns among cases.

3.2.3.2 The Disadvantages of Neural Networks

While NNs have been used successfully for classification, they do suffer h m the fact that

the network is viewed as a black box and there is no explanation of the result. Due to the

fact that the result is a completed network with layers and nodes iinked together with

nonlinear fùnctions whose relationship cmot easily be described, neural networks are

generally difficult to understand. Moreover, they suffer h m long leamiag times which

become worse as the volume of &ta p w s . Another major weakness of NNs is the lack o f

diagnostic help. If something goes m n g , it is difficult to pinpoint the problem h m the

mass of inter-related ndes and links in the network. These problems dong with their

inability to interpret the output are major disadvantages of these systerns ~ C H ! ) 4 ]

ITAYL97.

me Enhancement of Credit Càrd Fraud Detection Systems 36

CHAPfER 3 Fraud Solution Appmaches

3.2.3.3 Neural Networks and Fls

In the pst few years, NNs have received extensive attention and exploration h m the FIS.

The reason for this attention is the dynamic and evolvuig nature of the fiaud detection

application. Overall, n e d networks have shown effective results in areas such as h u d

detection by looking at massive quatltities of data which have a number of independent

variables. These systems have been trained to find patterns and correlation among the

incoming transactions.

3.2.3.4 FDS and Credit Card Fraud

As discussed, FIS make extensive use of NN based software to spot and £iag transactions

inconsistent with the cardholder's usual behavior. The focus of attention in this research is

FDS, a NN base software being used by 40 of the top 50 WNG97 large credit card issuers

worldwide including our collaborating FI. Historically, the tïrst version of this software

entered the market in 1992 [STE W94).

FDS is a real-the customized software designed to detennine the likeiihood of card fhucl.

By using legitimate and fiaudulent trausactions, FDS has built an individual behavior profiie

for each account. To the knowledge of the author, there is no documentation on the

software, due to the proprîetary and business concems of the software provider. Tberefore,

it is not clear how this profile is estabiished but the conjecture is that the account profile file

includes the type of merchant at which the cardholder typicaiîy shops, the hime of the day

that the cardholder normaily makes ptuchases, the geographic locations dong with many

more characteristics tbat only software developers are aware o t FDS inspects and

nCe Enhancentennt of Cr& Card Fraud Detection Systenu 37


evaluates the incornhg transactions to set if they fit into the customer's established profile.

Any deviation h m the usual cardholder's behavior is monitod and scored by this system.

Based on the changes that FDS detects in the customer's pattern of behavior, it assigns

scores between 1 and 1000 tu each transaction. The higher the scom, the hi&= the

iikelihood of kud.

Bank authonties set a threshold vdue and al1 transactions scored above this threshold are

considerd suspicious so that when these scom hit the set threshold, a case is created and is

flagged for Further investigation. Inherently FDS makes no assumption about the

suspicious transactions and transmit5 the flagged accounts, in real the , to the FI'S fiaud

department for m e r follow up and investigation (DEAN991.

3.3 Fraud Investigation Process To prevent more losses due to credit card hud , FIS have set up groups or departments

respomible for following up on the potentiaily suspicious transactions identified by the FDS.

The flagged accounts with their associated transactions an pnsented on the fiaud analyst

cornputer screen in real tirne. Fraud analysts examine the flagged transactions with the

client's history and h m their experience determine the potential risk aswciated with these

transactions. This judgement is based on different criteria such as the type of the

merchandise (e.g., jewelry, high price electronic items), the unusual number of transactions

or large amount of charges in a given day, the medit limit variations.

Tle Enhancement of Credit Card Fraud Detection Systems 38

CHAPTER 3 Fmud Solution Approaches

Based on bank poiicy, whether a transaction is considered to be legitimate or fiaudulent, the

h u d analyst bas to caii the cardholder for transaction venfication. In general in a h u d

investigation process, the following possibilities might occur:

0 The cardholder can be reached

- The cardholder confimu the hansaction, refend to as 'false positive'.

Approximately 90 percent of flagged cases by FDS are false positives.

- The cardholder denies the transaction which results in two possibilities:

(1) the card is lost, stolen or counterfeit, (2) the cardholder has made the

purchase but repudiates the event by reporting the carci as lost or stolen.

in both cases the h u d analyst will block the account.

0 The cardholder cannot be reached

- The investigator will leave the custorner a message to cal1 the bank back

as soon as possible, f i e may block the account temporarily and makes a

note on the system for m e r follow up.

- The analyst is not able to find the cardholder due to m n g address or

telephone nurnber. This case has the high potentiai of h u d therefore,

the account will be blocked.

This procedure will be repeated for al1 flagged accounts.

The Enhancement of Credii Cmd Fraud Detection Systems 39


3.4 Fraud Detection Dilemma

Credit card h u d detectioa is a pattern recognition problem. Every cardholder has a

shopping behavior which establishes a profile for her/h.im. As the result of personal needs

or seasonal reasons, patterns of behavior change over time so that dhe may develop new

patterns of behavior, which are not knowa as yet by the Fraud Detection System (FDS).

Very often an 'unusual' transaction is legitimate. It is notable that the terms legitimate and

non-fuud are quivalent and throughout this thesis are used interchangeably. Cmntly,

FDS identifies many legitimate accounts as fiaudulent resulting in a large number of false

positives (FPs). As every cardholder has a huge number of possibilities for developing new

patterns of behavior, the types of transactions are widely variable. Hence, it is ahost

impossible to identify consistent and stable patterns for al1 the transactions. In fact, there

are so many variations of behavior for each individual that are exponential in combination

and the complexity of enunerathg ail combinations of cases are enonnous. This ever-

changing pattern of behavior dong with the combination of legitimate and fiaudulent cases

has left the FIS with a large number of FPs (approximately 90% of flagged accounts) that

has to be investigated.

The motivation of this research is to address these challenges. In brief, the task is to pst-

process the FDS output and to identify the legitimate transactions (True Negatives, TN)

h m the Stream of flagged transactions. This identification is a classification task, that is,

the system we develop has to be able to extract the True Negatives (TNs) fiom the pool of

î l e Enhancement of Credit Cmd Fmud Deteciion Systems 40

CHAPTER 3 Fraud Solution Appmches

data while not missing fiaudulent transactions. If this goal could be achieved then the

bank stafFmay not need to c d these legitimate customers for transaction verification.

Pattrm recognition for these occuirences is inherently complex and one has to understand

the underlying system as much as possible and use this knowledge in the design of the

required system. Investigation of some of the Ai methodologies and their application

revealed that leuming is the appropriate approach for addressing this type of classification

problems. In fact, leaming is very much appropriate for cases where patterns of behavior

in real world problems are complex and there is little or no knowledgeof the semantics of

the application domain. A M e r survey on some of the leaming methodologies and their

application Ied to leaming decision trees methodology for this research topic.

3.5 Sumrnary

This Chapter introduces the existing h u d solution approaches and gives a brief introduction

to the existing Fraud Detection Systems (FDS). It touches on neural network technology,

its advantages and disadvantages and briefly describes its application to credit card h u d

detection. It M e r elaborates on the h u d investigation process and the associated issues.

The Enhmicement of Credit Cmd Fraud Deteetion Systrrms 41

CHAPTER 4 Methohlogy

4.1 Classification

The task of classification occurs in a wide range of human activity. In a broad sense, the

term could relate to any context in which some decision or forecast is made on the bais of

cunently available idormation. Then a classification procedure is applied for repeatedly

making such decisions in new situations. in a restricted interpretation, the task concems the

construction of a procedure that will be applied to a continuhg sequence of cases, in which

each new case must be assigned to one of a set of pre-defined classes on the basis of

observed attributes or features. Here the aUn is to establish a d e whereby one can classify

a new observation into one of the existing classes. Such problems are often referred to as

classification problems. The construction of a classification procedure h m a set of data

for which the m e classes are known has also been variously termed as pattcm recognition,

discrimination, or supervisad learning (in orciex to distinguish it fiom unsupervised leaming

or clustering in which the classes are inferreû h m the data). Supervised learning is

defieci as the establishment of the classification d e h m the given correcîiy c1assifi:ed

me E ' c e m e n t of Credit Cmd Fraud Detecrion Systems 42

CHAPTER 4 Methodology - - - -

samples. A much more difficult problem is that of unsupervised Leaming or clustering,

where solved cases are not known, so no classifications can be &en, and the samples

con& ody of observations. In that situation the goai is to identify clusters of patterns that

are similar, thus ideatifjhg potential classes. This type of problem is far less stnicnued

and its potential for success is much more limited, because it involves much more guessing

IWEIS911 (MICD41.

4.1 .1 Reasons for Classification

There are many reasons why one may wish to set up a classification procedure. Some of

the examples for classification problems are: (1) mechanical procedures for sorthg letters on

the basis of machine-read postal codes, (2) assigning individuais to credit status on the basis

of financial and other personal information, (3) the preliminary diagnosis of a patient's

disease to select irnrnediate treatment while waithg for definitive test results, and (4) credit

card &aud detcction [MICH94]. In fact, some of the most urgent pmblems arising in

science, industry and commerce can be considered as classification or decision problems

which ofkn require complex and extensive data for evduation.

4.2 Overview of Learning Systems Pattern recognition is an ana of science concenied with discriminating between objects on

the basis of information available about those objects. To be able to build a recognition

system, prior knowledge about the problem is necessary. This knowledge is u d y

avai1able in the fom of a dataset d e d the leanillig set.

CHAPTER 4 Methadology

A leaming system is a computer program that makcs decisions based on the accumulated

Uifonnation contained in the available known samples. A typical leamhg system is

designed to work with some generai model, such as a decision tm, a discriminant function,

or a neural net. Different leaming systems use a variety of techniques to extract the

knowledge nom the leaming set. These techniques include many highly mathematical

methods that can search systematically over large numben of possibilities to find the closest

fit to the data [WEIS91].

4.2.1 The Classification Model

In statistics the classification problem is sometimes called the prediction problem, and in the

field of machine learning it is &en called concept leaming. The f'undamental goal of

empincal learning is to extract a decision rule from the sample data where the outcomes are

known, such that the results can be applied to new data where the outcomes are not known.

The leaming system will use a set of examples, cailed the training set, to find the

generalized decision rules to build a decision-making system called the classifier. The

simplest way of representuig a classifier is to consider it as an algorithm, which produces a

decisioa for every pattern of data that is presented to it. This system accepts a pattern of

data as input, and produces a decision as output W S 9 1 1 . The sets of potential

observations relevant to a particular problern are referred to as features or attributes.

Tbroughout this thesis features and attniutes are used interchangeably.

To üain and evaluate a leamhg system, the available data should be divided into three parts:

(1) the training set, (2) the testing set, and (3) the case set. The training set is used to

17re Enhancement of Credir Cmd Froud Detecrion Systems 04

CHAPTER 4 Methodology

extract the maximum amount of information fiom the samples. The testhg set is used to

estimate the accuracy of the trained system and is a stage where the trained system is

validated. The case set is used to evaluate the prediction accuracy of the classifier on

fiiture cases.

Aithough extensive computer processing is required by any learning system, the analyst has

a very important role in the design of any classifier. For a given problem, at the very least

slhe must descnie and define the relevant set of observations and objectives. Al1 the

observations are syrnbols that are being manipulated by the cornputer. Thus, while the

computer can carry out different forms of andysis, much of the potential for success Lies

with the analyst who selects the real world data with the required accuracy (WEIS911.

4.2.2 Hypothesis Space in Supervised Learning

in s u p e ~ s e d leamkg, the leamhg program is given the training examples of the form

{ (xi, yl), ...,( xm, yk) ) for some unknown function y = Rx). The xi typically represents

discrete or real-valued components such as color, height. or age with their associated values.

The y values typically represent a discrete set of classes (1, ..., k). The task of leaming

prognun is: given a set of training examples of f, return a fùnction h that approximates f.

This hct ion is a hypothesis about the tme huiction f. Any preference for one hypothesis

over another, beyond mon consistency with the example, is called a bim. Because

usually there are a large number of possible consistent hypotheses, al1 leaming algorithms

exhibit some sort of bias (RUSS9q.

Ilie Enhancement of Credit C i d Fraud iktection Systems 45


A simple exarnple, shown in Figure 4-1. is used to clarify the meaning of hypothesis and

bias in this context. As this Figure shows, (x, y) points represents the training examples,

where y = Rx). The task of the leamhg algorithm is to h d a t'unction h(x) that fits these

points as closely as possible. As Figure 4-1 shows, the function used in (b) is a piecewise

Iinear fiuiction, in (c) it is a more complicated fwiction (e-g., quaciratic) and in (d) a Lest

Square fiinction is used to fit to the data points. As discussed above, the true f is unknown

and different huictions of h try to approximate f by finding a function that is a good fit to the

available data sarnples. Any preference of (b) over (c), (d) or any other possibility is

considered a bias.

me Enhancement of Credit Card Frmd Detecttün Systemr 46


Figure 4-1 An illustration of several hypothesis in leaming aigorithm RUSS951

me Enhancement of Credit Card Fraaui Detection S'stems 47


4.3 Perspectives on Classification Historically, the strands of research on classification can be represented in three main and

distinct categories: (1) statistical, (2) neural networks (Ws), and (3) machine leaming

(ML). As explained before, the goal of classification is to derive niles or procedures that

would be able [MICH941 :

0 To equal, if not exceed, a human decision-maker's behavior, but have the

advantage of consistency.

0 To handle a wide variety of problems and given enough data, could be

generalized.

For this purpose there are different algorithms that search a hypothesis space defmed by

some underlying rcpresentation (e.g., linear bction, neural networks, logical descriptions,

or decision trees). For each of these hypotheses representations, the comsponding learning

algorithm takes advantage of a different underlying structure to organize the search through

the hypothesis space.

Statistical methods are considaed parametric, whereas NN and ML methods are categorized

as non-parametric. Pacametric methods assume a certain form of the underlying model,

such as a normal (bell-shaped) curve for the classifier. Non parametric methods make no

assumption about the functional form of the underlying model. These methods employ the

power of cornputers to search and iterate unti1 they fhd a good fit to the sample data

w-41.

Ine Enhancement of Credit C'd Fraud Deteetion Systems 48


4.3.1 Statistical Approaches

Statistical approaches arc g e n d y charactetized by having an underlyhg pmbability

model. This model provides the probability of an event or object to be in each class rather

than simply to give the classification of the case. These methods attempt to provide an

estimate of the joint distrtiution of the features within each class which can, in tum, provide

a classification mle. It is usually assumed that statisticians will use these techniques.

therefore, some human intervention is assumed with regard to variable selection and

transformation, and overall stnicturing of the problem (MICE.941.

4.3.2 Neural Networks

Neural networks consist of layers of htercomected nodes, each node produchg a non-linear

fiuiction of its input. The input to a node may corne fkom other nodes or directly h m the

input data. Some nodes are also identified with the output of the network. The complete

network, therefore, represents a very complex set of interdependencies, which may

incorporate different degrees of nonlinearity, allowing very general fiinctions to be modeleâ

[MICH941.

4.3.3 Machine Leaming

Machine leaming is inherently a muitidisciplinary field. It draws on results b m vtificial

intelligence (AI), probabüity and statistics, computational complexity theory, control theory,

information thmry, philosophy, psychology, neurobiology, and other fields m C 9 7 .

Machine leamllig is aimed at generating classifying expressions simple enough to be

me Enhmrcement of Credit Card Fraud Detection Systems 49


understood by humans. Unlike statistical approaches, this operation is camed out without

human intervention. Machine learning is generally used to encompass automatic computing

procedures, based on logical or biaary operations, that leam a task Grom a series of

examples. Thus, ML is a method of data analysis where the classifiers, obtained h m a

training set of pre-classified cases, are used to predict the classes of new cases [MICH!#41.

Machine leaming methods have been applied to a variety of large databases to learn general

regularities implicit in the data For instance, decision tree learning algorithms have been

used by NASA to l e m how to classify celestial objects fkom the second Palomar

Observatory Sky Survey. This system is now being used to automaticaily classifi al1

objects in the Sky Survey, which consist of three terabytes of image data mTC971.

4.4 Learning Decision Trees Decision trees, a machine learning method, are perhaps the oldest, and one of the most

popular way to represent the outcome of classification learning procedure. It is a method

for approximating discnte-valued target functions, in which the learned kction is

represented by a decision tree [CUNNgq.

Decision trees are capable of representing the most complex problems given sufficient data,

and they are one of the most highly developed techniques for partitionhg siunples into a set

of decision rules. Learned trees can also be represented as sets of if-then d e s to impnwe

the human readabiiity. These leamhg methods are very popular and have been

successfiiiiy applied to a broad range of tasks h m leaming to diagnose medical cases to

leamhg to assess d t risk of loan applicants wTC9'7J.

The Enhancement of Credit Card F r d Detection Systems 50

4.4.1 Domain Application of Decision Tree Leaming

Althougâ a variety of decision tree leaming algorithms have been developed with somewhat

different capabiîities and requirements. decision tree learning is generally best suited to

problems with the followiag characteristics [MiTCPq :

a The target fhction has discrete output values. For instance. decision tree assigns a

'yes' or 'no' to each classified example.

a The training data may contain e m . Decision tree leaming methods are robust to

mors found in the attribute values that describe the input examples.

a The training data rnay contain missing attribute values. Even though the value of

some of the training examples might be unktlown, still decision tree leaming

methods can be employed.

It was realized that due to these characteristics decision tree learning is a suitable fit for this

research topic.

4.4.2 Ovewiew of Decision Tree Learning Method

A decision tree consists of nodes and branches. The starting node is usually refened to as

the root node. Each node is labeled with a feature narne and each branch leading out of it,

is labeled with one or more possible values for that featwe. Each node has just one

incoming btanch, except for the root, which is designated as the starting point. Eech

internai node in the tm corresponds to a test of the value of one of the feahws. Branches

firom the node are labeled with the possible values of the test. Leaves are labeled with the

n e Enhancement of Credit Cmd Fraud Detection Systenu 51

CHAPTER 4 Meaiodology - --

vaiues of the classification features and speciQ the value to be retunied if that leaf is

reached. A decision tree takes as input a set of feahires and their associated values and

classifies the case by traversing the tree. Depending on whether the r e d t of a test is tnie

or false, the tree will branch to one node or another. The featwe of the instance

comsponding to the label of the root of the tree is compared to the values on the mot's

outgoing branches, and the matching bmch is selected. This node label matching and

branch selection process continues until a terminal node, referred to as leaf is reached at

which point the case is classified according to the label of the leaf and a decision is made on

the class assignment of the case ICUNN971.

4.4.3 An Illustration of Decision Tree Induction

To visuaiize how a decision tree learning algorithm leanis nom the training set, the

following example has been adapted h m Russel mUSS951. The task is whether to wait

for a table at a restaurant or not. The aim is to l e m a decision for the concept WillWait by

employing decision tree methodology. As the füst step, the features that can describe the

examples are as follows:

1. Altemate: whether there is a suitable alternative restaurant nearby.

2. Bac whether the restaurant has a cornfortable bar area to wait in.

3. FdSat: tme on Fridays and Sahudays.

4. Hungry: whether we are hungry.

5. Patrons: how many people are in the restaurant (values are None, Some, and Full).

6. Raining: whether it is raining outside.

nte Enhancement of Credit Cmd Ftaud Detection Systems 52

7. Reservation : whether we made a reservation.

8. Type: the kind of restaurant (French, Italian, Thai, or Burger).

9. WaiEstimated: the wait estimated by the host (040. 10-30,30-60, %O minutes).

The decision tree that can cepresent this task is show in Figure 4-2. The tree can be

described as a conjunction of individual implications corresponding to the paths through the

tree ending into YesNo nodes. As an example, the path for a restauraut fidl of patrons

with an estimated wait of 10-30 minutes when the person is noi hungry can be expressed by

the followiag logical sentence:

V r Patron (r, Full) A WaitEstimate (r, 10-30) A Hungry (r, N) a WiN Wait (r)

The notation employed is defined below:

r: a general indictor for representing a person or object N: means 'No' : for every person or object A and a: then

Ilie Enhancement of C'redit Card Fruud Deiection Systems 53

CHAPTER 4 Meaiodology

Figure 4-2 A decision tree for deciding whether to wait for a table in a restaurant [RUSS951

WaitEstimate ?

Reservation?

?Re Enhancement of Credit Cmd Fraud Detection Systems 54

4.4.3.1 Induction of Decision trees from examples

In this section the sets of 12 examples (Xi, . . ., Xi*) dong with their value features, and the

value of the class associated to these features are illustrated in Table 4-1. It should be

noted that when the goal is true for some examples they are called positive examples and

when it is not tme they are called negative examples. As Table 4-1 shows the positive

examples are the ones that have the value of Yes (e.g., Xi, X3,.. .) for the goal WiII Wait and

the negative examples are the ones that have the value of No (e.g., X2, X5,.. .) for this goal.

The complete set of 12 examples is called the training set.

The task is to find a decision tree that agrees with al1 the examples. A trivial solution to

this problem is to constnict a decision tree that has one path to a leaf for each case where the

path tests each feanire in tum and follows the value for the example, and the ending leaf bas

the classification of the example. If this route is taken for leamhg a decision me, with the

occurrences of the same examples, the decision tree will obviously corne up with the right

classification without any errors. But this tree is not able to classi@ other cases correctly

because this trivial tree has just memorized the observations and has not extracted any

pattern h m the examples. If a leaming algorithm does not extract general rules fiom the

data it will not be able to extrapolate to new cases. That is why the leaming algorithm

looks at the examples, not at the correct fiuiction. While pondering this simple example,

one can understand why the leaming algorithm bas enors in the process of trauiiiig even

though the true clam of the examples is presented to it.

The Enhancement of Credit Cmd Frmd Detectôn Systems 55


Table 4-1 A small training set for the restaurant domain

Alt - Yes

Yes

No

Yes

Yes

No

No

No

No

Yes

No

Yes

Bar

No

No

Yes

No

No

Yes

Yes

No

Yes

Yes

No

Yes

- Fri - No

No

No

Yes

Yes

No

No

No

Ycs

Yes

No

Yes

Hm - Yes

Yes

No

Yes

No

Ycs

No

Ycs

No

Yes

No

Yes

Patron

Some

Full

Some

Full

Full

Some

None

Somc

FuU

Full

Nonc

Full

Rain -

No

No

No

No

No

Yes

Yes

Ycs

Yes

No

No

No

Reserve

Yes

No

No

No

Yes

Yes

No

Yes

No

Yes

No

No

Type

French

Thai

Burger

Thai

French

1 talian

Burger

Thai

Burger

Iîalian

Thai

Burger

Goal

WillWait

Yes

No

Yes

Ycs

No

Ym

No

Yes

No

No

No

Ycs

To find a pattern h m the examples meam to 6nd some regularities in the training set and to

be able to descnie a large number of examples in a concise way. The whole point of the

decision tree is to fïnd ways that only parts of the hput need to be incorporateci in the

structure of the tree to reech a decision. In other words, the decision tree algorithm tries

to fkd a 'smaii' tree that conectiy classines most of the training examples. This is an

example of a general principle of inductive leaming often called Occam's Razor: " me mmt

likeiy hypothests Ls the sinplest one that is consistent with ai2 observations. " The basic

ne Enhancement of Credit Càrd Fraud Detection Systems 56

idea behind the decision- tree learning algorithm is to test the most important attriiute h t .

The most important is the one that makes the most difference to the classification of an

example. This approach may lead to the correct classification with a srnail number of tests,

rneaning that al1 paths in the tree wiil be short and the tree as a whole will be small

ptuss95).

Figure 4-3 illustrates how a simplified version of the algorithm starts. In the fitst step, the

12 training examples are classified into positive and negative sets. Then the algorithm

starts by deciding which attribute to use as the h t test in the tree. It considers al1 possible

attributes in this way and chooses the most important one as the root test. As Figure 4-3

(a) shows, Patrons is a fairly important attribute because if its values are None or Some,

then it leads to example sets for which the classification is dennitely No or Yes,

respectively. If the value of this test is Full then hirther tests are required. As F i p 4-3

(b) shows Type is a poor attribute, because it has bur possible outcomes each of which has

the same number of positive and negative values. Al1 possible attributes are considered in

this way and the most important one is selected for the mot of the tree. Supposing that in

this example the most important attribute is Patrons, it is considered as the mot test. AAa

Patrons splits up the examples, each outcome is a new decision tne leaming pmblem in

itself. with the fewer examples and one fewer amiute (Patrons has already been picked

UP)*

me Enhancement of Credit Cmd Fraud Dete~non Systems 57

Flgnre 4-3 Partitioning the examples by testing on attributes (RUSS951

in (a) Patrons is a gwd attribute to test k t ; in (b) Type is a poor one, and in (c) given that PU~I -O~S is the k t test, the Hungry is fkly good second test.

Tlie Erîhancement of Credit Cmd F r d Detection Systtenis 58


Figure 4-4 iliustrates this partitionkg. This Figure shows that decision tree leaming

algorithms can be seen as a method for partitioning the universe into successively smalier

rectangles with the goal that each rectangle ody contains objects of one class, that is,

positive or negative WCH941. In Figure 4-4, the dashed line shows the reai division of

examples in the universe. The solid lines show a decision tree approximation.

Figure 4-4 Decision tree leaming aigorithms partitions the universe

into successively smaller rectangles [MICH94 J

In general, three possibilities can be considered for decision tree problems [RUSS951:

1. I f there are some positive and negative examples, choose the best amibute to split

them. In Figure 4-3 (c) this fact is iliusûated by using Hungry to spiit the

remahhg examples.

l'Xe Enhancement 4Credit Cmd Frmd Detection Systems 59

2. If all the remainhg examples are positive (or al1 negative) then the search is over.

For instance in Figure 4-3 (c), the answers of yes or no are assigneci to None and

Some, respectively.

3. If there are no examples le& the algorithm retums a default value, which is

calculateci h m the majority classification at the node's parent.

The decision tree leaming algorithm applied to this problem is show in Figure 4-5. This

algorithm continues until the tree shown in Figure 4-6, is constnicted. As it can be seen

this tree is distinctly different h m the original tree shown in Figure 4-2 despite the fact that

the same sample data were used to generate it. One might conclude that the leaming

algorithm is not leaming the correct hction. This conclusion is not correct because as

mentioncd before, the leamhg algorithm looks at the examples, not at the correct function,

and in fact, its hypothesis not oniy should agree with al1 the cases, but should be

considerably simpler than the original tree. The learning algorithm has not considered any

test for Raining and Reserwtion because it has been able to classi@ al1 the examples without

them. The algorithm also bas detected an interesting regularity in the data, that is, the

person will wait for Thai food on weekends (RUSS94.

me Enhancement of Credit Cwd Frmd Detection Systems 60

Figure 4-5 The decision tree leaming algorithm [RUSS951

Function Dechion-Trce-Learning (examples, attriiutes, default) returns a decision trce

Inputs: examples, set of examples

attributes, set of attributes

defait, defauIt value for the goal

if examples is empty then return de$ault

else if al1 examples have thc samc classification then return the classfication

else if amibute is empty tben retura Majority-value (exomples)

eise

&est t Choose-Attributc (amibutes, exomples)

Tree c a ncw decision trce with root test &est

for each value Y of b a t do

examples, +- (ckmcnts of erxomples with best = y )

subtrte t Decision-Trce-Lcarning (examplesi, attri'butcs - bcst,

Majority-Value (example)

Add a branch to tree with label vi and subtrce mrbtree

end

retum free

The Enhancement of Credit Card F w d Detection Systms 61

Figure 4-6 The resulting decision tree h m the 12 training examples

Ine Enhancement of Credit Cmd Fraud Detection Systems 62

4.4.4 Boosting

Boosting is a technique for generating and combining multiple classifiers, either decisiou

hees or rule sets. This technique is used to improve the prediction accuracy of the

classifiers. Boostbg may lead to a reduction in error rate but this effect is not guaraateed

and in some cases it might have no effect at dl. The effectiveness of boosting is not

d e t e d s t i c and it is not known beforehand. Oniy d e r employing this technique on the

data d cornparhg the results one cm see whether the prediction accuracy has improved or

not. in boosting, instead of one classifier, several classifiers are constructed and the

combination of their outcornes wiii determine the final class being assiped to the case.

Boosting may give higher predictive accuracy at the expense of increased classifier

construction tirne [QZTIN991.

4.4.5 Cross-Validation

One of the techniques for getting more reliable estimates of the predictive accuracy of the

classifiers is f-fold cross-validation (CV). The basic idea of the cross-validation technique

is to try to estimate how well the current system will predict the unseen data. The idea is,

instead of using one sample to buiid a tree and another sample to test the tree, the algorithm

will form several pseudo-independent samples h m the original samples and use these

samples to fonn a more accurate estimate of the enor. For this purpose, the program

splits the data into a number of folds (splits) equal to a chosen number. Experience on a

large number of datasets has shown that the number of fol& equal to 10 has achieved good

tesdts. That is why in many IePming algorithms the number of folds is chosen at 10 as

me Enhancement of Credit Cmd ~raud ~efection Systems 63

CHAPTER 4 Methadology

the default option. Each fold contains approximately the same number of exarnples and

the same class distriiution. For each fold in turn, a classifier is consbycted h m the

exarnples in ail the other fol& and then its accuracy is tested on the examples in the holdout

fold. In this way, each case is used just once as a test set. The error rate of a classifier

produced from al1 the samples is estimatecl as the ratio of the total number of enors on the

hold-out cases to the total number of cases [QllM991.

4.6 Summary

The goal of a learning system is to extract decision rules h m the sample data. Machine

learning addresses the problem of how to build cornputer programs that improve their

performance at some task through experience. Major points of this Section include:

Introduction of the classification notion and main strands of research in this area dong

with an overview of learning systems aud their requirements.

Designing a machine leaming approach involves a number of design choices, including

choosing the type of training expenence, the target function to be learned, a

represcntation for this target huiction, and an algorithm for l e h g the target huiction

h m the training exampies.

Leaming involves searching through a space of possible hypotheses to h d the

hypothesis that best fits the available training examples.

Description of learning decision trees and its domain of application.

l'?te Enhancement of Credit C i d Fra& Detection Systems 64

CHAPTER 5 Application

5.1 Data Requirements

The first step is to choose the type of training exarnples h m which the system will lem.

The learning algorithm wiil extract the panems of behavior h m the exampies fed into it,

therefore, the whole application is dependent on the information containeâ in the data set.

In general, data collection for a real world systern will have limitations, that is, there will be

some information lost or not provided and we are restricted to what is available.

The operation of the FI transaction authorizatioa and tracking system was described in

Chapter 2. As Figure 2-1 showed the transaction tracking system of the FI comrnunicates

with FDS in real time and transmits ail the incoming authorizations h m the POS to this

system to be scoreci. When the üansactiom get to a point where theu FDS score hits the

curreat threshold, a case is created and passeci on to the h u d department for M e r

investigation. In this way FDS detects the suspicious transactions h m the Stream of

--

27te Enhancement of Credit Card Fr& Detection Systems 65


transactions. This output includes both the real fiaudulent transaction (True Positives. TP)

hit by the system and false positives (FPs). It is important to note that thm are fiaudulent

transactions that are being missed by the FDS; because their score is below the threshold

and, therefore. they are being missed. This category of transactions is known as False

negative (FN) which means that the case was h u d but the system missed it. As discussed,

FDS has already identified some, perhaps most, of the TPs but in the meantirne it has

created a lot of FPs as well. The task nmains to us is to process these cases and to identify

as rnany real legitimate transactions as possible while trying not to miss hudulent cases

(FW -

5.2 Data Collection

To perform the analysis, the accounts flaggeâ by FDS were used as the input of the learning

systern. The data was pmvided by the collaborathg FI. The transactions fiagged by FDS

are taken over 45 days (lune, July, and part of August 99) and are related to a limiteci region

of Toronto. Togethet, ten separate files were provided. The fmt nine mes were related

to flagged confïrmed Iegirimate accouats, which togethrr consisted of 4919 accounts with

69,182 transactions. Due to the volume of data for the legitimate accounts, they were

divided into nine separate files. The tenth file included 707 fiaudulent accounts that

contained 6,725 transactions. It shouid be noted that the hudulent accounts have a

combination of hudhon-hud transactions. Hence, the hudulent accounts consisted of

1,743 legitimate and 4,982 huduient transactions. Due to the confidentiaiity of reai

account numbers and in order to have al1 the transactions h m each account together, a

me Enhancement of Credit Cmd Frmd Deteciiota Systems 66

CHAPTER S Application

substitute but unique number was assigned to the transactions of each account by the FI. A

very small sample of the data in raw format is available in Appendix A. AU ten files had

the same fields and each transaction had the following information:

A replacement account number

DatelTime of transaction

Transaction amount

Merchant country code

Merchant category code (SIC)

Decision code

POS

Type of card

Case creation

First action

Although the scores associateci to each transaction by the FDS, were of great importance for

the anaiysis by the leaming system, due to the proprietary and business concems of the

software provider, the FI was not able to provide this information. In the meantirne it was

essential to identiQ which transactions deviated h m the normal behavioral pattern of the

legitimate cardholders which caused the system to flag hem as potentially budulent. The

FDS scores could show this trend, wnetheless to make up for this data shortcoming, the

case creation date was provided as a proxy to each transaction. La& of scores not ody

may have serious impact on the precision of the classifier, but also due to the high volume of

n e Enhancement of Credit Card Fraud Detection Systems 67


data, it caused uncertainty and substantial amount of 'manual' work in selecthg the

transactions that occurreâ close to case creation date.

5.2.1 Labeling the Transactions

To use the data for ûaining, it was necessary to identify the fiaudulent transactions h m the

legitimate ones. Cunently, labeiing the hudulent transactions is done rnanuaily and the

h u d investigation department keeps conventional paper based ûaud files on which they

mark the transactions that were identified as hud. Due to this manuai process, there is no

mechanimi to migrate this information back into the transaction tracking system and,

therefore, there is no record keeping of them on the system. Fraudulent accounts normdy

have a mixture of fiaudulent and legitimate transactions, therefore, the confirmed âaud

transactions in fraud file were labeled by the bank with an asterisk (*).

5.2.2 Preprocessing the Databases

Raw data contains the ùifonnation that must be extracted but in the meantirne it contains too

much non-essential Monnation. The raw data provided by the FI, required substantial

preprocessing to weed out the imlevant information and to prepare the data set in a suitable

form for the leaming system. The original data files were in text format, therefore, Excel

was selected as a tool for data manipulation.

The f h t step was to go through al1 the transactions and h d the closest set of transactions

which match the case creation &te. In the absence of scores, thîs can partiaiiy help to

identie those c h a h of transactions which, h m the FDS point of view, did not have normal

Tire Enhancement of Credit Cmd Frmd Detection Systenu

CHAPTER 5 Application - -

behavior and caused the system to score them gradually and eventually get to a point where

they hit the threshold set by the bank. Manual inspection of data revealed the existence of

some inconsistencies in the data. To be able to use the data for the analysis these

inconsistencies had to be removed h m the datasets. Mer the preprocessing of databases,

the final legitimate database consisted of 13,426 non-fiaud transactions and the nnal

fraudulent database consisted of 6,666 transactions (4,969 h u d and 1,698 non-bud).

After the initial set up of the databases, the integrity of the data had to be investigated. The

data had fields with unicnowa values, spaces or zero values. Al1 unknown values were

replaced by a question mark "?". Letters such as A, D, R, P, K, S, Y, N, were checked to

be in one format, that is, in capital letters. Meanwhile, al1 non-ûaud transactions in both

databases were labeled with the letter 'W.

5.3 Learning Requirements

Learning means behaving better as the result of expenence. The task of a lcarning system

is to extract the maximum amount of Uiformation h m the data samples, and based on tbis

information, to estimate the accuracy of its funue classifications and predictions. W e

concepaially simple, extracthg idonnation h m a large database nquires carefiil

erg-tion and the specification of the goals to be met by the learning system. The

simple requinment of the classification methods is that the data be presented in the form of

samples composed of patteras of observations with the correct classification. Then the

leaming procedure wüî be applied which is an iterative process wIS911.

me Enhancement of Credit Cmd Fmud Detectiion Systems 69

CHAPTER 5 Applicaüon

5.3.1 Features and Classes

in the problem of predicting whether a flagged transaction is h u d or noa-hud, there are

two classes: h u d and non-hud. The task is to predict which is the correct class based on

the observations of a set of transactions. By employing a decision tree learning algorithm,

the aim is to l e m a definition for the concept, Transaction (fiaudnon-fraud), where the

definition is expressed as a decision tree. In setting this up as a leamhg problem, the

properties or features that are available to describe the examples are presented in Table 5-1.

A mal1 sample data set for credit card transactions obtained fiom the FI is available in

Appendix A.

me Enhancement of Credit Card Fraud Detection Systems 70


Table $1 Credit card observations

Feature Description C.

Account No. Cardholder's account number

Date/Time Date and time of transaction

Doiiar 1

Dollar amount of transaction L

SIC Merchant category code

Country Merchant country code

Decision J

Authorized (A), DecIined (D), Referrai (R), Pick up (P)

POS Card swiped (S) / keyed (K)

Case creation

me Enhmcenzent of Credit Curd Fmud Detection Sysrems 71

The Aay / tUne case crcated by m3S

Case action The day / timc ûaud d y s t started to iavcstigate on the case


5.4 Concept Learning and Search Space

The problem of hding generai huictions fiom specific tmining examples is centrai to

learning. Concept leamhg is acquiring the definition of a general category given a sample

of positive and negatives training examples of the category. Concept leaming can be

viewed as the task of searching through a large space of hypotheses, implicitly defined by

the hypothesis representation (e.g., decision trees), to find the hypothesis that best fits the

training examples [NIITCgq.

in general, a well-dehed l e h g problem requires a well-specified task, source of training

experience, and performance metric IMITC971. Applying these critena to this research

application results in the following descriptions:

0 Task T: Transaction is either h u d or non-hud

Training experience: A database of legitimate and fiaudulent transactions with

their labels.

Pedormance measure: Percentage of cases classified conectly by the

classifiers.

To complete the design of the leaming system, the following factors should be chosen:

The exact type of knowledge to be leamed ( ie , classifj6ng the transactions as

hud/non-fraud).

A representation for this target knowledge (Le., decision ûecs )

0 A leaming mechanîsm (Le., a leamhg algorithm)

Ine Enhancement of Credit Cwd F d Detectiun Systems 72

CHAPTER 5 Applicaüon

5.4.2 Software Selection

If leaming is viewed as a search problem, then it is nahval that leaming algorithrns wiil

examine diffeicnt strategies for searching the hypothesis space. The algorithsi that are

capable of efficiently searching very large hypothesis spaces to find the hypotheses that best

fit the training data are of great interest. In the field of ML. a variety of progcams have

been developed. Three of these software are: C4.5 IQUIN931, CART [BREIMl, and

RIPPER [COHE951. C4.5 and CART are decision tree learning based software whereas

RIPPER is a mle based learning system. RIPPER was not chosen because it is a Unix

based system and there was no access to Unix systems. The PC version of CART is

available but it was costlier than C4.5. Moreover, CART can only produce decision tress

whereas C4.5 is able to traiisform the gmerated trees into a set of rules. C4.5 is a Unix

based system. The PC version of this software is also available, called SeeS.

5.4.2.1 Trees into Rules

There has always been an argument in favor of rule-based representations over tree

stnictured representations. on the grounds of readability and user-fiiendliness. When the

domain is complex, decision trees can become very "bushy" and clifficuit to understand,

whereas mies tend to be modular and easier to understand. ûn the other hand, decision

trre construction pmgrams are usually ver- fat. A compromise is to use a decisioa tree

algorithm to b d d an initial tree and then derive niles h m the thus transformhg the

trees into a set of niles [QUM871. This fûnctionality is implemented in SeeS.

The Enhancement of Credit C i d F d Deteetion Systents 73

CHAPTER 5 Application --

Moreover, a rule set generated h m a tree usually has fewer d e s than the tree has Ieaves.

A simple example adapted h m [CUm97J shows the reason for this compactness.

Consider the propositions (A=l and B=l ) or (C=l and D=l ). If each of the four attriiutes

of A, B, C, and D, has two possible values (e.g., 1 and O), the proposition represented by the

rule sets is as follows:

Rulet: A = l a n d B = l + +

Rule2: C = l a n d D = l + +

Rule 3 : otherwise + -

If feature A is arbitrarily selected as the partitioning criterion for the mot node, the most

compact single decision tree representation for this rule set is shown in Figure 5-1.

Obviously this decision tm is iess understandable than the above nile set. This readability

problem corresponds to the number of possible paths through the tree. Just Rules 1 and 2

generate three paths in the tm, s h o w in Figure 5- 1 :

In general, many hctions with small prepositionai or rule representations have

corresponding decision tries that are large, redundant, and inefficient (CüNN97. For very

large data set, however, generating ruies cm require considerably more cornputer t h e than

gaierathg the tms.

The Enhancement of Credit Cmd Frmd Detection Systtems 74


Figure 5-1 Decision tree representation for (A=l and B=l) or (C=l and D=l)

î%e Enhancement of Credit Cmd F d Detection Systm 75


5.5 See5

SeeS is a decision tree leamhg software package which was designeci and developed by

Ross Quinlan, an scholar, pioneer, and researcher in the field of machine leaming for many

years. Quinlan is the director of the Rulequest Research Institute, located in Australia, and

SeeS could be purçhased fkom him through the Internet. See5 is a learning system that

extracts informative patterns fiom the data. It analyzes the data to produce decision trees

andor rule sets that relate a case's class to the value of its features. The following sections

introduce the SeeS options that were employed in the analysis for the current application.

See5 Construction Options

5.5.1.1 Decision trees

As the default option, SeeS constructs a decision tree. A set of training and testhg data

was selected and the program ran with See5 default option. The result is show in Figure

5-2. Although the program gives al1 the details of the individual decition trees, due to

their large size, only the output summary of the learning sets is shown in this Figure. The

fht section shows the evaluation results of the decision tree, first on the training set from

which the tree was constnicted, and then on the test set. The size of the tree shows the

number of leaves and the column headed errors, represents the number and percentage of the

cases misclassifieci by the tree. The tree, with 193 leaves, misclassifies 1,816 of the 13,405

cases, thus M g an error rate of 13.5%. Performance on these cases is fiuther analyzed in

a coafusion mat& that pinpoints the kin& of errors made. In this example, the decision

The Enhancement of Credit Card Fraud Detection Systems 76


tree misclassifies 370 (3.6%) of the legitimate cases as fiaudulent and 1,446 (42.5%)

fraudulent cases as legitimate.

For the test set, the tree with 193 leaves, misclassifies 929 of the 6,465 cases, thus having an

e m r rate of 14.4%. The coafusion matrix for the test set again shows the detailed

breakdown of correct and incorrect classifications, The decision tree misclassifies 545

(1 0.7%) of the lepitirnate cases as hudulent and 384 (27.2%) fiaudulent cases as legitimate.

One might ask why the training algorithm rnakes any errors in the training phase while it is

classifying the cases where the outcome is known. One should keep in mind that the

essence of leamhg is to move beyond the training sarnples. Thus, the leaming algorithm

does not memorize the cases it has seen but rather its attention is extracting rules and

patterns of behavior h m the data to be able to generalize and extrapolate them to fiiture

cases.

Figure 5-2 Output sumrnary of the learning set (Decision tree option)

Evaluation on training data (13405 cases) :

(a) (b) <-classified as ---- ---- 9679 370 (a) : class N 1446 1910 (b) : class Y

me Enhancement of Credit Curd F r d Detection Systems 77

Evaluation on test data (6465 cases) :

(a) (b) e-classif ied as ---- ---- 4511 545 (a) : class N

384 1025 (b) : class Y

5.5.1.2 Rulesets

Decision mes cm sometimes be very difficult to understand. An important Feature of SeeS

is its ability to convert trees into collections of niles called rule sets. The same leaming set

ran by the Rulesets option of SeeS. Here again, the program gives al1 the details of the

individual decision trees and d e sets but for the sake of brevity only the output summary of

the leaming sets is shown in Figure 5-3.

As can be observed, the decision tree with 193 leaves is reduced to 81 rules but the d e s

have a slightly higher e m r rate than the trees (0.2%). The rulesets option, with 81 rules,

misclassifies 1,840 of the 13,405 cases, thus having an error rate of 13.7%. Peflomance

on these cases is m e r analyzed in a confuaon matrix that shows the types of enors made.

In this example, the rule misclassifies 29 1 (2.8%) of îhe legitimate cases as fiaudulent and

1,549 (46.2V0) fiaudulent cases as legitimate.

For the test set, the nile sets misclassi& 877 of the 6,465 given cases, showhg an enor rate

o f 13.6%. The confuson matrix for the test cases again shows the detailed breakdown of

Tire Enhancement of Credir Cmd Fmud Detecrion Systems 78


correct and incorrect classincations. The rule sets misclassiQ 48 1 (9.5%) of the legitimate

cases as fiaudulent and 396 (28.1%) hudulent cases as legitimate.

Figure 5-3 Output summary of the leamhg set (Rulesets Option)

Evaluation on training data (13405 cases):

(a) (b) c-classified as

9758 291 (a) : class N 1549 1807 (b) : class Y

Evaluation on test data (6465 cases):

Decision Tree Rule8 ----*-a--------- ---------------- Size Errors No Errors

(a) (b) C-classified as

4575 4 8 1 (a) : class N 396 1013 (b) : class Y


5.5.1.3 Boosting

Boosting was introduced in Section 4.4.4. Al1 the steps rquired for the construction of

different classifiers in this procedure are embedded and implemented in the leaming

algorithm, by the software developer, and nomally there is no documentation on the details

of these procedures due to propnetary issues.

The Boost option with 10 trials was selected and the program ran for the same leaming set.

The summary of the results is shown in Figure 5-4. As the first step, a single decision tree

or d e set is constructed as before from the training data This classifier will usually make

rnistakes on some cases in the dataset (Trial O in Figure 5-4). When the second classifier is

constructeci, the algorithm pays more attention to the misclassified cases to hy to get them

nght. This makes the second classifier different fiom the h t one (Trial 1 in Figure 5-4).

The second classifier will also make erroa on some cases, and these become the focus of

attention during the construction of the third classifier (Trial 2). This process continues

for a pre-detennined number of iterations. The Boost option with x trials allows SeeS to

construct up to x classifiers in this manner (suggested default is 10). Naturaily, constnicting

multiple classifiers requires extra computational tirne and resources but the effort might be

worth the cost. Different ML sources and trials over numemus datasets, large and smaîi,

have shom that on average the 10-classifier boosting is the most appropriate choice

[ ~ m 9 9 1 .

It should be noted that Boosting trials greater thui 10 were also examined in the experllnents

pdormed, however, it never exceeded 10 traiis before the aigorithm tenninated. One

llre Enhancement of Credit Card Fraud Deteetion Systens 80

CHAPTER 5 Application . . . . . - - .

example is illustrateci in Figure 5-4. It is interesting to note that although the number of

ûials for the boost option was set to be 10, the aigorithm terminated after 7 trials. This

shows that the soAware Ca. determine when there is no improvement possible on the

accuracy reached.

The classifier pefionnance is summarized for each inai on a separate he, while the h e

labeled boost shows the overall nsults of al1 the classifiers [QUIN991. The constnicted by

Trial O is identical to the one produced without the Boost option (See Figure 5-1). Some

of the subsequent trees produced when the aigorithm was paying more attention to certain

cases have quite high overall e m r rates. When the seven trees were combined by the

functions implernented in the algorithm, the final predictions have an error rate of 1 1.4% on

the training examples.

Figure 5-4 Output summary of the learning set (Boost Option)


Trial -----

O 1 2 3 4 5 6

boos t

(a) (b) c-classified as ---- ---- 9951 98 (a) : clam N 1429 1927 (b) : clase Y



O 1 2 3 4 5 6

boos t


4671 385 (a) : class N 503 906 (b) : clas8 Y

5.5.1.4 Cross-Validation Trials

The cross validation technique was introduced in Section 4.5.5. The evaluation resuits of

the cross-validation option (using 10 fold or split) for the same example is shown in Figure

Figure 5-5 ûutput summary of the learning set


(a) (b) <-classified as - - - - ---- 9559 490 (a) : class N 1648 1708 (b) : class Y

Every t h e a cross-validation is nui a different random parîition of the training cases is used.

The error rate of the decision trees produced from the 13,405 cases in the training set is

estirnated at an average of 1 6.1 %.


5.6 Design of Experiments

For setting up the experiments several steps were taken.

5.6.1 Data set Design

Databases were randomiy split into two main groups by the approximate proportions of 2 3

and 1/3 split FaTc971 [illrCH94] for training and testhg sets, respectively. The training

set is used to design the classifier, and the testing set is used to evaluate the accuracy of the

classifier derived. While suaticient test samples are the key to accurate error estimation,

adequate training cases in the design of a classifier are also of great importance. Therefore,

the non-bud database with 13,426 transactions was split into two databases of 8,963 and

4,463 transactions. The fraud database split resulted in 4,442 transactions for training and

2,222 cases for testing. It is important to have a rather large test set for system validation.

From 2,222 cases, 2000 transactions were arbitrarily used for testing the classifier and the

rest (222 cases) wen put aside as case set to evaluate the prediction of the classifier on new

cases.

5.6.2 The Notion of Class Distribution Design

in domains, such as credit card hud, the namal class distribution is between 1090 and

20:80. This meam that between 10 to 20 percent of the flagged cases are h u d (minority

instances) and the rest are legitimate (majocity instances). In other words, the number of

huduient transactions is much smaller than the legitimate ones.

me Enhancement of Credit Cmd F d Detection Systerns 84


One of the factors that contribute to the success of a leaming process is the class distribution

in the training set. Using the same algorithm, different training class distributions can

produce classifiers of different quality. Very often using the nahiral class distniution might

not yield the most effective classifier. In other words, using the naRual class distribution

for trainuig, might cause the leaming algorithm to treat the minority class instances as noise

or simply produce classifiers that always predict the majority c l w instances [CHAN98aI.

Related works in fiaudulent cellular phone calls or credit card h u d that have class

distribution between 10:90 and 20:80, show that class distribution for training is very

important and can dramatically improve the performance of the classifias. Extensive

e-ents have shown that training data with a 50% fraud distribution produced the best

classifiers (CHAN98bJ [STOWq (FAWC971.

if the assurnption is that the distniution of examples infïuences the performance of the

resulting classifiers, thm how can one chanictenze the effects of the training class

distribution on the performance of the classifiers and select a class distribution that can

produce the most pndictive classifiers? B a d on the aforementioned results, the best

approach was to create data subsets with different class distributions, then apply the lePming

algorithm to these subsets and evaluate the effect of class distributions on training by

evduating the pdormance of the resulting classifiers on the test sets and fuhw cases.

5.6.3 Distribution Design for Training and Testing Sets

For se- up the experiments with different class distributions, the first step was to identify

the class distriiution of the selected databases. By t a h g into consideration that the hud

Ilie Enhancement of Credit Cmd Fmud Detection Systems 85


database had the combination of h u d and non-hud transactions, a simple calculation

revealed that the class distribution of the training and testing databases were 2575 and

21 M 8 . 1 , nspectively. In order to train on data that has an artificial h i e r h u d rate and

observe the nsult of the constructeci classifien, the following distribution and partitions,

shown in Table 5-2, were fomed for the training phases.

Table 5.2 Design distribution for training and testing sets

5.6.4 Experiments

As Table 5-2 illustrates, sets of training examples with different class distributions were

designed and used as the input of SeeS to generate the classifiers and observe the effect of

class distribution on training, testing and pndiction of new cases. The choice of feanins

can also affect the performance of the leaming systems. Therefore different combinations

were coasidered in the classifier construction procedure to discover the most effective

combinations. The feature combinations examinecl are: (1) all features, (2) al1 features

except for the card type, and (3) ali features except for the POS and card type. Altogether

The Enhancement of Credit Card Frmd Detection Systenu 86

Training Class Distribution

Training cases

Non- fiaud Transactions

Testing cases

Pure h u d Transactions

Non-bud Transactions

hirt ûaud Transactions


nfty four experiments were performed. The results of these experiments are presented in

Chapter 6.

5.7 l nterpretation of TNIFPIFN/TPIError Rates

An enor is a misclassification, that is, the classifier was presented with a case and it

classified the case incorrectly. Empirical ermr rate can be defined as the ratio of the

number of errors to the number of cases examined WIS911.

mor rate = number of errors / number of cases (5-1)

If al1 errors were of equal importance, a single-error rate, calculated as in equation 5-1,

would summarize the overall performance of a classifier IWEIS911. However, for some

applications, distinctions among different types of enors are essential because different

emrs have ciiffernt associated costs. For instance, in the credit card h u d detection the

errer cornmitted in assessing a hudulent transaction as legitimate (FN) is usually

coasidered as f a more serious than the opposite type of error, namely, assessing a legitimate

üansaction as hudulent (FP).

if distinguishiag ammg error types is important, then a con@sion matrix [WEIS911 can be

used to show the dîstricbution of these emrs. Table 5-3 is an example of such a matrix for

two classes of (Non-hud I Fraud ) application. The coahision ma& lîsts the correct

classincation agauist the predicted classification for each class. The number of correct

predictions for ersch class f d s dong the diagonal of the matrix. AU other numbers

represent the nuxnber of errors for a particuiar type of misclassification emx. For instance,

me Enhancement of Credir Cmd Fraud Deteaion Systrenrs 87

CHAPTER 5 Application - - - -

in Table 5-3, class non-hud is correctly predictd 9,679 times and is erroneously predicted

370 times.

Table 513 Confusion matnx for two classes

Predicted Class

To better understand the eflect of the four metrics, their rate of occurrence cm be calculated

as follows:

True Class

N (Non-fiaud)

Y (Frotrd)

True Negative rate is defineci as -- FP-tTN

N (Non- fiaud)

9,679

1,446

ET Fahe Positive rate is dehed as --

FP+TN

FN False Negative rate is definecl as ----

TP+FN

TP Tme Positive rate is dehed as -

TP+FN

m e Enhancement of Credit Cwd Fraud Detection Systems 88


Tabk S9 Two-class classification peifomiance

5.8 Summary

This Chapter provides a detailed explanation on data acquirement and the steps required for

processing the data to make it suitable for the andysis. It introduces the leaming software

used and its capabilities. It M e r elaborates on the data set and class distribution designs

for training and testing sets, and the variations of experiments petfomed.

Cases

Class Negativt

(Iegr'h'mate)

Class Positive

@audulent)

me Enhancement of Credit Curd Fraud Detection Systems 89

Prediction Negative (Iegitimate)

Tme Negatives (TN)

(Nonnal)

False Negatives (FN)

(Miss)

Prediction Positive Cfiaudulent) 1

False Positives ( F P )

(Fahe o l m )

Truc Positives (TP)

(Hi)

C W T E R 6 Results and Discussion

This section presents the experimental results of processing different sets of data with

various training class distributions. Unfortunately there was no information available on

the costs associated with h u d offices investigations, therefore, potential savings of the

rnethodology developed in this study, could not be estimated.

6.1 Structuring the Results

Fifty four experiments were pedormed to study the effects of class distniutions on training

and variations of different features on the evaluation of the classifiers constructed. The

experirnents were conducted ushg SeeS construction options of : (1) decision trees, (2)

rulesets, (3) boosting, and (4) ten fold cross validation (CV).

For each option, the program was run to explore the effects of various class distributions and

features on training and testing sets. Although the cross validation technique is typicaily

used for intermediate sample sues (of order 2000) [QUINBJ, however, a decision was

îne Enhancement of Credit Cmd Frmd Detection Systems 90

CHAPTER 6 Results and Discussion - -- - - - - - -

made to examine this option in the experiments conducted, to have extra evaluation on the

training and testing sets as well.

The experiments produced 54 sets of results. Thirty six of these results are the classifiers

and the other 18 results present the evaluation of ten fold CV trials on training and testing

sets. A selection of output summaries of SeeS for several variations of features and class

distribution are presented in Appendix B.

Tables C-1 to C-9 of AppendUc C presents the evaluation results of the 36 classifiers on

training and testing data sets. Tables C-1 to C-3 are related to the training class distribution

of 2575 while exploring the effects of different features in the classifier construction.

Tables C-4 to C-6 and C-7 to C-9 are related to the training class dismiution of 33:67 and

5050, respectively. Each Table uicludes the construction option, the size of the generated

trees and I or rules, dong with the number of enon and their percentage for both training

and testing sets.

6.2 Performance Analysis To do the analysis the classifier attained the lowest emr rate among the 36 classifiers

preseated in Tables C- 1 to C-9, was selected. As these Tables shows, the boosted decision

Irees (BDT) classifier trained on 2575 class distribution attained the lowest error rate of

11.4%. This classifier was selected as the fint choice. To compare the performance of

this classiner against another one. the decision trees PT) classifier train& on 25175 class

distriiution by attai*ning the emr rate of 13.5% was considered as the second choice.

Ine Enhancement of Credit Cmd Frmrd Deteciion Systems 91

CHAPTER 6 Results and Discussion

As the hud rate increases in the training sets, the emr rate also increases leading to the

conjecture that there is no need for m e r analyas on the class distriiution with higher

h u d rates and the above discussed classifiers are the most effective classifiers for M e r

analysis. However, based on the work of other researchers [STOL971 [CHAN98a)

FAWC97 who have employed various training class distribution in their analysis for h u d

applications, a decision was made to study the above discussed BDT and DT classifiers not

only for 2575 class districbution but also for the class distributions of 33:67 and 5050.

As discussed, in situations where different types of errors have different costs such as in

credit card fiaud detection, the elements of the confusion matnx such as TN, FP, FN, and TP

are the essentid metrks for the system's performance. Therefore, these metrics were

considemi to be the tnie indicators for the performance evaluation of the selected classifiers.

To compare these metrics for the selected BDT and DT classifiers, a new set of Tables were

formed. Tables 6-1 to 6-6 and 6-7 to 6-1 2 are related to the evaluation of the selected BDT

and DT on the training and testing sets. Each Table depicts emr rate, TN, FN, TP, FP and

their associated rates for each class distribution and the sets of features. RI, TP, FN, and

FP rates were calculated based on the idormation available from the confiision matrùt of

each classifier, some of than, reported in Appendix B.

To visualize the performance of these classifiers, two plots @igwes 6-1 and 6-2) were

prepared. Figures 6-1 was plotted based on idonnation obtained h m Tables 6-1 to 6-3,

and 6-7 to 6-9 iliustrating the pafomiimce of the BDT and DT classifiers on the training

data. Figures 6-2 was plotted based on information obtained h m Tables 6-4 to 6-6 and 6-


10 to 6-12 iilustrating the performance of the BDT and DT classifiers on the testing data.

in these plots, the x-axis represents the percentage of ûaud rate in the training set wherwis

the y-axis represents the FN rate of the classi fiers.

The Enhancement of Credit Card F d Detection Systems 93


Boosted Decision Tree Evaluation on Training Data

Table 61 Evaiuation on training data (Allfeatures are considered)

Table 6 2 Evaiuation on training data (Card type is disregarded)

class distnlbution

25 : 75

33 : 67

50 : 50

Table 6 3 Evaluation on training data (POS & card rype ore disregarded)

errorrate

class distribution

25 : 75

33 : 67

50 : 50

Tiie Enhancement of Credit Cmd Fratid Detection Systems 94

TN

crror rate

0.127

0.167

0.197

cIass dismiution

25 : 75

33 : 67

50 : 50

TN

9881

6530

3025

FP

0,114

0.152

0.185

FP

168

173

338

FN

1538

1503

987 I

mor rate

0.145

0.172

0.22

98

248

125

9951

6455

2938

FN

TF

1818

1853

TN

9941

6430

2932

1429

1276

821

TF

RI rate

0,983

0.975

1927

2080

2535

2369 1 0.9

FP

108

273

431

TN rate

TP rate

0.542

0553

FN

1831

1455

1046

0.99

0.964

0.874

0.706

TP

1525

1901

2310

TP rate

FN rate

0.458

0.447

TP rate

0.454

0567

0.689

TN rate

0.97

0.96

0.872

0.575

0.619

0.755

FP tate

0.017

0.025

0.294

FN rate

0.1

FN rate

0,545

0,433

0311

FP nte

0,425

0.38

0.245

FP rate

0-03

0-04

0.128

0.01

0.036

0.126

CtiAPTER 6 Results and Discussion

Boosted Decision Tree Evaluation on Testing Data

Table 6-4 Evaluation on testing data (Allfiatures are considered)

Tabk 6 4 Evaluation on testing data (Card stpe is disregatded)

Table 6-6 Evaiuation on teshg data (POS & cmd t p e are disregarded)

class distribution

25 : 75

33 : 67

50 : 50

FP rate

0.076

0.1 14

0.247

FN enor rate

0.137

0.16

0.232

TN

4671

4481

3806

class distribution

25 : 75

33 : 67

50 : 50

FP

385

575

1250

TN

4696

4640

3750

cnor rate

0.129

0.133

0.245

TP

FP

360

416

1306

FN

571

419

297

FP

256

474

1418

class distribution

25 : 75

33 : 67

50 : 50

TP rate TN rate FN rate

0.924

0.886

0.753

0.643

0.676

0.824

503 1 906

TP

892

990

1112

m o t rate

0.12

0.138

0.259

0.357

0.324

0.176

952

FN

471

442

281

TN

4800

4582

3638

952

TN rate

0.949

0.907

0.72

FP rate

0.0712

0.082

0,258

TP rate

0.666

0.686

0.80

TP

938

967

1128

248

FN rate

0.334

0.314

0.20

TN rate

0,928

0.918

0.742

TP rate

0.633

0.702

0.79

' 1161

FN rate

0.367

0.298

0.21

FP rate

0.051

0.093

028


Decision Tree Evaluation on Training Data

Table 6-7 Evaluation on training data (Allfeatures are considered)

Table 6-8 Evduation on training data (Card type is disregarded)

Table 6-9 Evaluation on training data (POS & card are disregarded)

FN ratc

0.331

0.428

classdistributian

25 : 75

33 : 67

50 : 50 I

FP rate

0.036

0.044

TN

9679

6402

enor rate

0.135

0.173

FP rate

0.042

0.082

0.130 J

me Enhancenent of Credit Cmd FraudDetectiion Systems 96

class distribution

0.206

RI enor rate

class distribution

25 : 75

33 : 67

I I I I 1 I l l I 0.297

FP

370

301

50 : 50 1 0235 1 2836 1 52'7 1 1053 1 2303 1 0.844 1 0.686 1 0314 1 0.156 1

m r rate

0.153

0.192

0.1 16 2972

TN rate

0.964

0.956

FP

25 :75

33 : 67

50 : 50

TN

9874

TP ratc

0.569

0.572

FN

1446

1438

146

553

440

FN

1870

141 1

FP

175

TP

1910

1918

0.149

391

FN ratc

1845

1312

1067

FN

9903

6181

0.884

FP rate

0.04

0.078 522

0.703 996

1511

2044

2289

TP

0.185

FN ratc

0.557

0.42

TP

1486

1945

2360

6 150

0.958

0.918

0.87

TN mtc

TN tacc

0.96

0922

TP ratc

I

TF ratc

0.443

0.58

0.451

0.61

0.682 0 2 4

0.549

0.39

0.318 2923

CHAPTER 6 Results and Oiscussion

Decision Tree Evaluation on Testhg Data

Table 610 Evaluation on testing data (All features are considered)

Table 6 1 1 Evaluation on testing data (Card type is disregarded)

Class distriiution TN rate error rate

Table 612 Evaluation on testing data (POS & card type are disregarded)

TP rate FN rate TN FP rate FN FP

FPratt

0.045

0.133

0.262

TP

FNrate

0.386

0.281

0.207

Class distriibution

25 : 75

33 : 67

50 : 50

TN

4828

4384

3729

enorrate

0.1 19

0.165

0.25

FN rate

037

0.268

0.2

TP rate

0.63

0,732

0.8

FP rate

0.007

0,124

0.29

Clms distribution

25 : 75

33 : 67

50 : 50

FN

522

378

293

TPrate

0.614

0.719

0,793

crror rate

0.118

0.155

0.275

FP

228

672

1327

TP

887

1031

1116

FN

544

397

292

TP

865

1012

1 117

TN

4816

4430

3572

TN rate

0.953

0.876

0.71

TNrate

0.955

0.867

0.738

W

240

626

t484


6.2.1 Classifier Performance

To choose the most effective classifier ushg the appropriate class distribution, the

performance of the classifiers should be analyzed. Figures 6-1 and 6-2 are used as the

basis for the analysis. These Figures and the associated Tables demonstrate that the boosted

decision trees (BDT) ûained on 25~75 hudnon-hud distribution attained TN rates of 99%

and 92.4% and FN rates of 42.5% and 35.7% on the training and testhg data, respectively.

The comparative decision tree PT) classifier attained TN rates of 96.4% and 89.3% and FN

rates of 43.1 % and 27.2%.

These Figures also indicate that M rate decreases as the minority cases increase in the

training data and is lowest at 5050 class distribution. As these Figures show, BDT

classifier trained on 5050 class distribution attained the TN rates of 87.4% and 75.3% and

FN rates of 24.5% and 17.6% on training and testing data, respectively. This classifier

attained the lowest FN rate (Le., 24.5% and 17.6%) among al1 the other classifiea.

The desired classifier is the one that can identifi as many legitimate transactions (TNs) as

possible while not misclassi@ing the hudulent transactions (Fm), otherwise significant

Iosses will occur. Based on this goal, BDT classifier trained on 5050 distniution by

having 17.6% FN rate on testhg set, appears to be the most pdct ive classifier among al1

the other classifiers constructeci in this study.

ne Enhancement of Credit Cmd Fr& Deteaion System 98

C W T E R 6 Results and Discussion

Figure dl Evaluation of BDT and DT classiners on the training data

0.2 1 1 1 I

20 30 40 50 Percerrtage d fmid in the training diata

The Enhancement of Credit Cmd Fraud Detection Systmns 99


Figure 6-2 Evaluation of BDT and DT classifiers on the testhg data

20 30 40 50 Porcentige of frrud in the training data

ne Enhancement of Credit C i d Fraud Detection Systems 1o0


6.3 Prediction of New Cases

Once the most effective classifier was found, its quality could be m e r assessed by

examining its prediction accuracy on the new cases. New cases are the ones that have not

been used in the training or testing proceàure and were referred to as the case set. A set of

250 transactions from legitimate accounts and another set of 222 transactions h m

huduient accounts were used for this evaluation.

6.3.1 Prediction of Boosted Decision Trees

For prediction on new cases, BDT classifier was used. When using the classifier, an

interactive window asks for the values of the data attributes associated with the example.

Al1 the values wiU be entereû manually. The features requested, and the order in wbich

they are requested, depend on the classifier itself. For instance, the classifier may ask for

the value of 'dollar amount' or 'merchant country' as the first attniute and then it will ask

for the second attribute which can be 'card type', or any other feature. Mer al1 the

necessary attnbute values have been entered, the most probable class is shown with a

probability value. This value is a number, in the range of O to 1, associated with the

prediction of Fraud (Y) I Non-hud class. Two examples of this prediction are shown

in Figures 6-3 and 6-4.

To predict the class of each transaction, the required values w m e n t e d intexactively and

the preâicted class dong with the probability value for each prediction are shown in Tables

D-1 and D-î of Appendix D. These Tables contain the transaction features, their values,

ne Enhancement acredit Card Fraud Deteaion Systems 101


predicted class, the probability value associated with this class prediction, and the correct

class for each transaction.

The Enhancement of Credit Card Fmud Detection Systrenrp 102

C W T E R 6 Resulb and Discussion

Figure 6 3 Prediction of BDT classifier trained on 5050 districbution on a legitimate case


Figure 6 4 Prediction of BDT classifier trained on 5050 distribution on a fiaudulent case

Ine Enha~cement of Credit Card FruzuiDetectron Systrenu 104


6.3.1.1 Class distribution of 25:75

Table 6-13 summarizes the performance evaluation of the BDT classifier, trained on 2575

class distribution, on the prediction class of new cases. Recall that hudulent accowits

have a mixture of fiaudaon-fraud cases, hence the last row of this Table also includes

legitimate transactions. As this Table demonstrates, this classifier misclassified üiree

legitimate transactions out of 249 legitimate cases resulting in a RiI rate of 98.8 %.

However, the performance of this classifier for hudulent accounts dramatically degraded

by classifyuig 49.7% of the fiaudulent cases as legitimate (1 00 out of 20 1 cases).

Table 6-13 The summary results of prediction on new cases by BDT

Correct Class

6.3.1.2 Class Distribution of 5050

To make a cornparison, the performance of BDT, traincd on 5050 class distribution, on the

prediction of new cases was also exarnined. The same procedure discussed in Section 6.3.1

was foilowed for this evaluation. Al1 the attribute values were entered manually and the

results are show in Tables D-3 and D-4 of Appendix D. These Tables have exactly the

Predicted Class

Account Type

Lcgithate

Fraudulent

ïïie Enhancement cf Credit Card F m d Detectio~ Systems 105

h o r rate %

1.2

49.7

Legitimate

249

21

I

Fraudulent Lcgitimte Fraudulent

-

Misclassified

3

10 1

246 3

2+100 20 1 19


same fields as Tables D-1 and D-2. Table 6-14 summaRzes the performance evaluation of

this classifier on the prediction of new cases. As this Table shows, this classifier

misclassified 20 legitimate transactions out of 249 cases anaining a TN rate of 92 %. The

perfomance of this classifier on fiaudulent cases was much better than the comparative

BDT with class distribution of 25:75. This classifier correctly classifiai 147 fiaudulent

transactions out of 201 fiaudulent cases attaiaiog a FN rate of 26.8%.

Table 614 The summary result of prediction on new cases by BDT classifier

Comct Class

The Enhancement of Credit Cmd Fraud Detection System

Prcdictcd Class

Account Type

Legitimate

Fraudulcn t

Error rate %

8.0

26.8

I

Ltgitimate

249

21

Fraudulcnt

- 20 1

Misclassificd

20

10 + 54

Legitimate

229

I I

Fraudulent

20

147


6.4 Concluding Remarks

Tables 6-1 3 and 6-14 s r n a r i z e s the performance of BDT classifiers, train& on 2575 and

5050 class distniution, nspectively . As these results demonstrate, the classifier trained on

25:75 class distribution has the FN rate of 49.7% on the classification of hudulent

iraasactions ('y missing half of the hudulent cases) whereas the comparative classifier

trained on 5050 distribution bas the FN rate of 26.8%. This cornparison r e a f h s that

BDT classifier trained on 5050 distribution is the higher performance classifier for the

prediction of new cases.

One question that might aise is what happens if the class distributions of 60:40 (60% h u d

cases in training set) or 7525 (75% h u d cases in training set) is considered for the

datasets? As stated before, the h u d dataset was rather mail, therefore, it was not possible

to fom these distriiutions and explore their effects on the performance of the system. Yet

other researchers [CHAN98bJ have examined these dis tr i ions for their h u d detection

andysis and their results show that the existence of h u d cases higher than 50% in the

training set degraded the performance of the classifier and based on theu results, they

concluded that 5050 class distribution is the suggested distribution for the construction of

higher performance classifiers.

The Enhancement of Credit Cmd Fruud Detecfion Systenrs 107

CHAPTER 7 Conclusions & Recommendations

7.1 Conclusions

Fifty four experiments were conducted to determine the most predictive classifier. These

experiments were performed based on severai variations and combinations of featwes and

training class distributions. The evaluation of classifiers, constructed from different sets of

experiments, was different on training and testing data confimiing that significant attention

has to be paid in the class distribution design of the training sets. The performance metrics

considered for this analysis were True Negative (TN) and False Negative (FN) rates. The

BDT classifier trained on 5050 class distribution attained a TN rate of 87.4% and 75.3%

and FN rate oE24.5% and 17.6% on training md testing data, respectively. B a d on this

performance, this classifier considered being the most predictive classifier for this study by

having the lowest possible FN rate among al1 the other classifiers constructed in this

analysis.

Die Enhcement of Credit Cmd Frmd Lktection Systenu IO8

C M E R 7 Conclusions 8 Recammendations

This analysis nafnmis the importance of training class distriiution in the design of the

effective classifiers. This study shows that increasing the number of mùiority instances in

the training data will produce classifiers with improved performance. It also shows that

increasing the number of majority instances in the training data wili produce classifiers that

are adept at classifying the majority of transactions as legitimate and as a result, these

classifiers classi@ a large number of fiaudulent cases as legitimate leading to very high FN

rate.

The performance of the BDT classifiers on the prediction of new cases was also examineci.

This analysis sbowed that the classifier trained on 25:75 distribution of fiaud/bgitimate

transactions anaineci the TN rate of 98.8% in the prediction of legitimate cases. However,

the performance of this classifier degraded on the identification of huduleut cases so that

the classifier identified haif of the fiaudulent transactions as legitimate, attaining a FN rate

of 49.7%. The degradation in performance makes the system unusable because missed h u d

cases are very costly. The classifier trained on 5050 distribution had lower TN rate (92%

against 98.8%) on the prediction of legitimate transactions, however, its FN rate on the

prediction of h u d cases was very much lower (26.8% against 49.7%) than the comparative

BDT classifier. This aaalysis reafnrms that classifier trained on 5050 class distribution is

more predictive for the evaluation of new cases.

The other important factor which may have a serious impact on the performance of the

classifiers was the limitations of the data sets. The most Unportant limitations were rather

small h u d database and the lack of FDS scores associated with the flagged transactions.

me Enhancement of Credit Cmd Detection System 109

CHAPTER 7 Conclusions & Recommendations

These scores are an indication of some patterns of behavior in the datasets and contain

valuable information. The result of the experiments conducted on the variations of features

revealed that the classifias trained on al1 features perfonned much better than the ones

trained while disregardhg some features such as POS and card type. Based on these

empirical resdts, one would expect that if the FDS scores were provided, they would

contribute important information thus leading to better performance.

There was no information available on the cost of investigation associated with every case

created by the FDS, therefore, the savings fiom the use of the system trained could not be

estimated. Based on the obsewed results on the prediction of new cases, one could expect

that this approach may reduce the volume of personal investigations leadhg to potentially

significant savings for the FI.

Currently, due to the high volume of false positives flagged by the FDS, the FI has set a

rather high threshold for this system. Therefore. there are cases that are hudulent but are

behg missed (FN) by the FDS. By instituthg a post-pmcessor systern such as the ML,

the FI bas the option of lowering the threshold and allowing FDS to flag more cases for

investigation.

Another important point is the prevention of unnecessary disturbance of the customers

which may lead to customer dissatisfaction.

in summary, pattern recognition for legitimate/naud occumnces is inherently complex and

since legitimate cardholders '/ hudsters' patterns of behavior evolve over the, this study is

a basis for M e r research, Overaîi this study demonstrates that the approach employed in

n e Enhancement of Credit Card Detection Sysems 110

CHAPTER 7 Conciusions & Recommendations

this research, has a very good potential of identiwg the legitimate transactions h m the

fiaudulent ones.

7.2 Recommendations for Future Research

Thc potential of the trained system for the identification of legitimate transactions h m the

budulent ones, flagged by FDS, is promishg but there is a need for the enhancement of its

predictive accuracy. Some of the most important recommendations for fiiture research, that

can explore the possibilities of enhancements on the prototype and its potential deployment

in credit card hud detection are listed below:

Leaming systems do the best they can with what they are given. Et is quite possible that

revising or adding new featwes may lead to much better performance for the same

leaming method mIS911. Sufficient and representative data are the foundation of al1

leaming systems and this study showed that using al1 features produced classifiers with

better pe1.fomiance. Therefore, for further improvernent on the trained system, data

requirements must be Mfilled. In this respect, two major requirements for fcwther

andysis are FDS scores and a larger h u d dataset.

To fom datasets with higher minonty instances (Le., 60:40, 70:30, etc. of hud/non-

h u d cases) in order to explore the effect of class distribution on the performance of the

classiners and based on this evaluation to chwse the most predictive classifier for use.

There are âiffierent leaming techniques that cm be applied to the same sample data. For

a given application, some leamhg systems may be better than others. In general, there

me Enhancement of Credit Cut-d Detection Systems 11 1

CHAPTER 7 Conclusions & Recornmendations

is no guarantee that any of these methods work or that any single method is necessarily

the bat. In this study one prominent leaming software (SeeS) was utilized. The two

other weii-known software, namely, CART [BRE184) and RIPPER [COHE95] should

also be examined. These software have been applied in na1 world problems such as

credit card h u d detection and they have shown Unpressive results. By employing

different techniques the performance can be measured and the algorithm which yields

the best performance can be selected.

a As discussed before, fiaud environment is dynamic, therefore, the system being designed

must be adaptive to changing h u d environment.

Tire Enhancement of Credit Cmd Detection Sysfem 112

ABM - Automated Banking Machine

Attributes - See features

Authorization - To be able to make a purchase by credit cards, cardholder's FI must

authorize the transaction Born the central cornputer.

Authorization Log - FI system keeps a record of dl the authorkations that pass through its

mainfi=ame in a database cailed "authorization log" for hiture reference.

Bias - A preference for one hypothesis over another. Shce in most l e d g situations then

are a variety of possible consistent hypotheses, al1 leaming algorithms have some sort of

bias.

Cardholder Fiie - Al1 the Somation related to the cardholder is kept in this file for

accounting purposes.

Card Identification Device (CID) - Special security feature Uicluded in the magnetic

stcipe of Amaican Express to counteract the counterfeiting pmcess.

Card Issuers (Ch) - Mtutions that issue credit cards.

Card Veriacation Value (CVV) - Specid sectaity feature included in the magnetic stripe

of VISA to counteract the comterfeiting piocess.

me Enhancement of Credir Cmd Frmd Detecrion Systems 113

Card VerIfication Code (CVC) - Special security feature included in the magnetic stnpe

of MasterCard to counteract the counterfeiting process.

CBA - Canadian Bankers Association-

ClassiBeition - To assign a specific class to a case.

Classifier - A decision-making system that classifies the class of cases based on the pattern

instances it has learned, is called a chsst~ee. The simplest way of representing a classifier

is as a black box, which produces a decision for every admissible pattern of data that is

pnsented to it. It accepts a pattem of data as input, and praduces a decision as output.

Concept - A classification nile that partitions a domain into two parts: those instances that

satisfy it and those that do not satisfy it.

Concept Learnlng - Merring a Boolean-valued function h m training examples of its

input.

Confusion Ma- - A matrix which pinpoints the kinds of errors made in the analysis.

This ma& shows the detail bredcdown of correctly and incorrectly classified cases.

Credit Limit - The restncted maximum amount assignecl by the FIS on each card issued to a

cardholder. Any credit in excess of such Limit wiil require the issuer's authorkation to

enable any transaction above that limit.

Decision Trees - A simple structure for inductive leaming. Given an instance of the

problem, specified by a set of featirres and their values, a decision tree returns a es" or

"no" decision about the instance. Therefore, decision ûees are Boolean classifiers. Each

bmching node in the tree represents a test on some aspect of the instance.

The Enhancement of Credit C d Frmd Detection Systeins 1 14

Dellaquent - In cases w h m the cardholder payment is las than the minimum amount, the

credit rating of the cardholder is affected and the cardholder is considered delinquent.

Error Rate - The most cornmon measure for evaluating the performance of classifiers is

enor rate (1- accuracy). Emr rate can be defined as the ratio of the number of enors to the . number of cases examineci. This ratio measures the percentage of incorrectly classified

instances and has the implicit assumption thai each error is equaily important.

False Negative - When the system misses a fiaudulent ûansaction.

FaLe Positive - When the system flags a legitimate haasaction as fiaudulent.

Feahires - The sets of potential observations relevant to a particular problern are refemd to

as features. Features are also known by other names such as 'attributes', and 'variables'.

Financial Institutions (FIS) )- Banks, credit unions, trust companies, major retailers, etc.

Floor Limit - There are merchants who have assigned a floor k t and purchases below that

amount could be authorized by the merchant and need not be authorized through

cardholder's FI system. The Iunit set depends on the kind of business, the store location,

type of merchandise or s e ~ c e and other factors. Any value in excess of the floor limit

requins the authorization of the card issuer.

Fraud Anaiyst - Human experts employed and trained by the FIS to follow up and

investigate suspicious transactions in order to detect 6aud.

inductive Leamhg - inductive learning is a khd of 1e-g in which, givm a set of

instances the system tries to estimate or create an evaluation function. Most inductive

leaming is superviscd leaming, in which examples are pmvided with classification. More

formaly, an eeamp1e is a pair of@, gx)), where x is the input and f(x) is the output of the


hction applied to x. The task of induction is, given a set of examples off; h d a

hypothesis h that approximates f:

Leaming - An approach to improve problem solving through experience. It is "an

increase in knowledge when knowledge is knowledge in principle."

Leaming from Examples - h inductive learning concepts are leamed h m sets of

labeled examples.

Machine Leaming - Class of programs and algorithms that improve through experience.

These programs search over a large space of hypothesis to find the one that best fits to the

charactenstics of the training data.

Magnetic Stripe - A dark, machine-readable stripe on the back of the plastic cards for

storing card holder information.

Mainframe - Central cornputer of FIS in charge of a nurnber of important activities such as

processing the incoming transactions for authorkation, record tracking, issuhg monthly

statements, and so on.

Merchant File - For accounting purposes Fis keep merchant records and information in a

file called "merchant file".

Neural Networks - A class of knowledge-based models in Aï.

Negative FUe - Due to the fact that having a copy of each VISA cardholder in the FI'S

systern is not practicai, ai i the card aumbers that have been considered fiaudulent

intematiody, are included in a file called 'negative file' which is updated quite fkquently

with the occurrence of new h u d cases.

ne Enhmcement of Credit Cmd Frmd Detectio~ Sysems 116

Noise - When there is contradictory information in the data such as two or more examples

with the same descriptions (in tams of the attri'butes) with Merent classifications. In

other words, examples might have exactiy the same description but a different classification

is assigned to them. This means that some of the &ta are incorrect. if this happens then the

decision tree l e h g algorithm must fail to h d a decision tree consistent with al1 the

examples. This happens when data is labeled incorrectly (e.g., the examples were positive

but were labeled as negative)

Ockham's Razor - "The most likely hypothesis is the simplest one that is consistent with

al1 observations." Introduced by the 13" cenhiry philosopher William of Ockham.

Olf-lhe - When the system is not connected to a cornputer or data communications network.

On-line Authorlartion - When authorization of a transaction uses equipment which is

connected to a cornputer or data communications network and is carried out in real the .

Persona! Identification Number (PIN) - The security code assigned to the card to be used

in an Autornated Banking Machine (ABM).

Point of Sale (POS) - Location at a merchant where a customer makes a pmhase.

Point of Sak (POS) Terminal - A machine placed in a merchant location which is

connected to the Fl's on-line authorizaiion system via a modem, designed to authorize,

record and forward data for each transaction.

Posted Transaction Fik - This file keeps a record of al1 the current transactions that a

cardholder has made and as yet has not posted to the statement, This file calculates the

c m t balance of every account, keeps a rerord of them and at the end of the month this

information will be posted to the wdholder's statement,

n e Enhancement of Credit Card Fraud Detection Systems 117

Smart Cards - Smart car& feature a rnicroprucessor memory chip as well as data encoded

on its magnetic ship.

Stand in Processing - In occasions when FI'S maidiame is non-functionai for

authorization, the Tandem does the authorization considering two criteria: (1) the 'negative

file', (2) an assigned floor lunit.

S tatement - A list of al1 the cardholder's transactions during one accounting period.

Statewnt FUe - This file is used for keeping track of the statemeat balances and payments.

At the end of the cardholder's cycle the accumulated transactions will be sent to this file.

The monthly statement for the cardholder is printed out of this file.

Supervised Learning - Any situation in which both the inputs and outputs of a component

can be observed.

Swipe Machine - Same as the POS machine (see Point of Sale).

Tandem - A non-stop, central computer being used by FIS to pmcess al1 the incoming

transactions and route them to proper place for authorization. Also in the absence of

maidiame it does the 'stand in' pmcessing.

Test Set - A set of instances and their classifications used to test the accuracy of a learned

system. The training set is used to create the classifier. The test set is used to validate the

performance of the classifier.

TriWng Set - A training set is a set of problem instances (desmied as a set of feanires

and their values), together with a classincation of the instance. Training sets are used in

superviseci leaniag.

Transaction - A cardholder makes a purchase using a credit card.

Tme Negative - When the transaction is legitirnate and normal.

Truc Positive - When the üansaction is fiaudulent and system hits it.

Unsrpervised Leamhg - When there is no information about what the correct

outputs are. Unsupe~sed leamers can leam to predict fbture percepts based on present

ones, but cannot leam which actions to take without a utility function.

Voice Authorization - There are merchants who do not have POS machines and bave to c d

their FI and ask for authorization.

me Enhancement cf Credit Card Fraud Deteetion Systems 119

Nomenclature

A - Transaction authorized

D - Transaction declined

K - Card keyed

S - Cardswiped

R - Transaction refmed to FI staff

P - Card has to be picked up

N - Transaction is tegitimate

Y - Transaction is h u d

FP - False Positive

FN - False Negative

TP - True Positive

TN - True Negative

ML - Machine Leamhg

NN - Neural Network

CV - Cross Validation

DT - Decision tree

BDT- Boosted decision tree

The Enhancemennt of Credir Cmd Fraud Deteetion Systems 120

References

Anonymous; Countefeit hologram manufactwing in the people f republic

of China; N P Regional Security and Risk Management; MasterCard

International Inc.; November 1, 1994.

Anonymous; Smart bank car&: The Nilson Report; No 586; December 1994.

Anonymous; Future of bank car& - Pan I (Fraud); The Nilson Report;

No. 568; March 1994.

Anonymous; Credit card thiews deal banks a bad han4 Texas Banking;

Austin; Vol. 87; Issue 3; Page 24; March 1998.

Anonymous; Technology = opportunity forfiaud; Texas Banking; Austin;

Vol. 87; Issue 4; Page 40; Apnl 1998.

L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone; Classification

sion trees; Wadsworth, Belmont, CA; 1984.

P. Chan, S. Stolfo; Toward scalable learning with non-unifrm clam and cost

distributions: a case study in credit card ficru& Proceedings of fourth

international coderence on knowledge discovery and data rnining; ~164-168;

1998.

The Enhancment of Credit Cmd Fraud Detection Systems 121

P. Chan, S. Stolfo; Learning wzWIth non-uniform class und cost dish.~*butions:

Effects and a disnibuted multi-classifer approach; Work notes on KDD-98

workshop on distributeci data rnining, pl-9; 1998; Submitted to the Machine

Leamhg Joumai; 1999.

Dan Clark; YISA Authorizations. Vision PIUS - FAS, prelirninary Analysis;

Bank Card SeMces - SRD; TD Bank; July 2,1997.

Dan Clark, Herman Chan; ZD Bank; Personal communication; Dec. 17, 1998.

William W. Cohen; Fast effective mie induction; Proceeding of the 12 " international conference in machine leaming; Lake Taho, CA; 1 1 5- 123; 1995.

Sdly JO Cunningham, Matt Humphrey, Ian H. Witten; Understanding what

machine leaming produces. Part 1: Representations and their

comprehehpibility; Department of cornputer Science, University of Waikato.

Elford Dean, Raj Thomas, Lorry; Visa secwity c m t q Personal

meetings; January 7 and February 1 1,1999.

Paul Demery; An easyjix forfiaad?; Credit Card Management; New York;

Vol. 1 1; Issue 2; Page 72-76; May 1998.

Kazuo J. Ezawa, Ti1 Schuennann; Fraud / UncollectiMe debt detection

uring a Bayesian network learning system: A rare binary outcorne with nrixed

data structures: Proceedings of Uncertainty in Artificial Intelligence, UA1

95; Morgan Kahann; 1995.

T. Fawcett, F. Rovost; Adaptivejhd detection; Data Mining and

Knowledge Discovay 1; 291-3 16; 1997.

ïïie Enhancement of Credit Cmd Fraud Deiection Systems 122

Peter Hacifield; Srripe makes credir cardfimrd tougher; USA Today;

Arlingtoa; Iuly 14, 1998.

Donald V. Macdougail, Richard G. Mosley, Garioch J. 1. Saunders; Credit

card mime in Canada: Investigation - Prosecution; The Canadian Association

of Crown Counsel; page 1-56; January 198%

Francois Mativat, Pierre Tremblay; Counterfiting credit cards; The British

Journal of Criminology; London; Spring 1997.

D. Michie, D. I. Spiegelhalter, C. C. Taylor; Machine Leamim. Neural and

Statistical Classification; Ellis Honvood; 1994.

Tom M. Mitchell; Machine Leaming; WCB McGraw-Hill; 1997.

Lea Purcell; Roping in ri& Bank Systems &Technology; New York; Vol. 3 1

Issue 5; Page 64; May 1994.

J. R. Quinlan; Generatingproduction rulesfiom decision trees; Proceedings

of the 10 international joint confcrence on Artificial inteliigence; Morgan

Kaufmann, San Mateo, CA; page 304-307; 1987.

J. R. Quinlan; C4.5: P r o m s for Machine Learning; Kauhann, San Mateo,

CA; 1993

Tndy Ring; Fraud Detection and More; Credit Card Management; New

York; Vol. 10; Issue 6; Page 128; September 1997.

W'iam Robds; me impact offiaud on new methods of retailpapnent;

The Enhancement of Credit Cmd Frmd Detectio~ Systems 12.3

References

Economic Review; Federal Resewe Bank of Atlanta; Atlanta; Vol. 83;

Issuel; Page 42-52; First Quarter 1998.

Stuart Russel, Peter Norvig; Artificial Intellieence: A Modem A~proach

prentice HaU; 1995.

Paul Seethaler, Credit Card Fraud Detection via the Ap~iication of Neural

Networks; M.A. Sc. Thesis; Dept of Industrial Enginee~g; U of T; 1995.

Isabelle Sender; Detecting and combatingfiaud; Chain Store Age; New

York; Vol. 74; Issue 7; Page 162; July 1998.

Keith Slotter, Plastic Payments: Trends in credit card Fraud; FBI Law

Enforcement Bulletin; Washington; Vol. 66; Issue 6; Page 1-7; June 1997.

Russell Smith; Car& Garner: Plasticjkaud and m i m e ; Austraiian

Accountant; Melbourne; Vol. 67; Issue 1 1; Page 56-58; Dccember 1997.

John Stewart; So, whai else con neural nets do? ; Credit C d Management;

New York; Vol. 7; Issue 6; Page 44; September 1994.

S. Stolfo, WPan, W.Lee, A. Prohmidis, and P . Chan; Credit cardfroud

detection whg rneta-leaming: Issues and initial resuits; Work notes AAAI-

97 workshop on AI approaches to Fraud Detection and Risk Management;

1997.

Warren Taylor; Credit Card Risk M-ent; Mc-Graw Hill; 1997.

Sharon Walsh, CmcAing dom onfiaud wirh eredit car&; The Washingion

Post; December 5,1995.

Tie Enhancement of Credit Cmd F d Detection Systenrr 124

WIS911: Sholom M. Weiss, Casimir A. Kulikowski; Cornauter Svstems that L e m

Morgan Kaufmam; 199 1.

Canadian Badcers Association; Fast Facts - Credir Car&; October 1998.

Canadian Banken Association; Fast Facts - Credit Curd Fraud; Oct. 1998.

htt~:/~w.cba.calen~/Shtistics/FastFridit card fiaudhtm

Canadian Bankers Association; Fast Facts - Credit Car&; June 1999.

h~:ilwww.cba.ca/ene/S ta tistics/FastFacts/visamc. htm

Canadian Bankers Association; Fast Facts - Credil Card Fraud; June 1999.

Jittm//www,c ba.ca/cne/S~tistics/FastFadit - card hud. hm

Canadian Bankers Association; MasterCard and Yiso Statistics; June 1999

Tracy LeMay; Credit Card Fraud Epidemic Worsens; Financiai Post, Friday,

Page D 1; December 18,1998.

The Enhancenient of Credit Cmd F m d Detectrion Systteins 126

APPENDIX A A Sample of Datasets

A small sample of the data provided by the collaborating FI is s h o w in this Appendix. The

transactions flagged by FDS were collected over a 45 day penod (June, My, and part of

August 1999) and are related to a limited region of Toronto.

Non-hud files consisted of 4919 accounts with 69,182 transactions while the h u d file

consisted of 707 accounts with 6,725 transactions (mixture of 1,743 non-fhud and 4,982

hud). Due to the volume of data (thousands of pages) only a very srnall sample has been

inc luded h m .

The empty spaces, NA, and zero represent the unbiown values. The transactions labeled in

the hudulent file with an asterisk (*) are the ones that are identified by the bank as

fnuddent. Table 5-1 h m Chapter 5 has bem reproduced here to describe the meaning of

each field in the sample datasets.

nie Enhancement of Credit Càrd F r d Detecrion Systems 127

APPENDfX A A Sample of Datasets

Feature

Account

Date

Dollar

Country

SIC

Decision

POS

Card

Case creation

Case action

Table A4 Dataset variabIes a d theu definition

Description

Cardholder's account nwnber - .. .- -

Date and thne of transaction

Dollar amount of transaction

Merchant country code

Merchant category code

Authorized (A), Declined @), Referrai (R), Pick up (P)

Card SVViped (S) / keyed (K)

Card type (Classic / Gold)

The day / t h e case created by FDS

The day / time h u d analyst started to investigate on the case

Table A-1 A small sample from legitimate files

Account Date Dollar Country SIC Decision POS Card Case creation Case Action

Classic 7/21 199 i2:Og 7/21 199 12:18 Classic 7/21 199 1 2:Og 7/21 199 1 2:18 Classic 7/21/99 l2:Og 7/21/99 12:18 Classic 7/21/99 t2:Og 7/21/99 12:18 Classic 7/21/99 1 Z:O9 7/21/99 12:18 Classic 7/21/99 12:09 7/21/99 12:18 Classic 7/21/99 12:09 7/21/99 12:18 Classic 7/21 199 1 2:O9 7121199 12:18 Classic 7/21 199 12:09 7/24 199 12:18 Classic 7/21 199 l2:Og 7/21/99 t2:18 Classic 7/21 199 l2:Og 7/21 199 123 8 Classic 7/21 199 1 2:Og 7/21 199 5 2:18 Classic 7/21 199 l2:Og 7/21 199 12:18 Classic 7/21/99 12:09 7/21/99 12:18 Classic 7/21 199 1 2:Og 7/21 199 1 2:18 Classic 7/21 199 1 2:Og 7/21/99 1 2:18 Classic 7/21/99 12:09 7/21/99 t2:18 Classic 7/21 199 l2:Og 7/21 199 12:18 Classic 7/2 1 199 1 2:Og 7/21 199 1 2: 1 8 Classic 7/21 199 12:09 7/21 199 12:18 Classic 7/21 199 l2:Og 7121 199 123 8 Classic 7/21/99 12:09 7/21/99 12:18 Classic 7/21 199 1 2:Og 7/21 199 1 2:18 Classic 7/21 199 1 2:Og 7/21 199 1 2:18 Classic 7/21 199 1 2:Og 7/21 199 1 2:18 Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

ne Enhuncement of Credit Card Frmd Detection Systems 129


Classic 7/9/99 9:39 7/9/99 10:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 719199 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 7/9/99 10:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 719199 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 719199 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 7/9/99 10:39 Classic 7/9/99 9:39 7/9/99 10:39 Classic 7/9/99 9:39 7/9/99 10:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 719199 1 O:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 7/9/99 9:39 7/9/99 1 O:39 Classic 6/29/99 1 2:S 6130/99 1 6:l6 Classic 6/29/99 1 2:S 6130199 1 6:l6 Classic 6129199 1 2:W 6130199 1 6:l6 Classic 6/29/99 1 2:54 6130199 1 6: 1 6 Classic 6/29/99 l 2 : S 6130199 l6:l6 C tassic 6/29/99 1 2 : s 6130199 1 6:l6 Classic 6/29/99 l2:M 6130199 l6A6 Cfassic 6/29/99 1 2 : s 6130199 1 6:l6 Classic 6/29/99 1 2 : s 6130199 l6A6 Classic 6/29/99 1 2:U 6130199 1 6:l6 Classic 6/29/99 1 2 :s 6/30/99 l6:l6 Classic 6/29/99 1254 6/30/99 16:lQ Classic 6/29/99 1 2 : s 6130199 1 6:l6 Classic 6/29/99 12:s 6/30/99 4 6:l6 Ctassic 6/29/99 l2:W 6130199 ?6:l6 Classic 6/29/99 12:s 6130199 16:l6 Classic 6/29/99 12:s 6130199 l6:l6 Ctassic 6/29/99 1 2:!% 6/30/99 1 6:l6 Cfassic 6/29/99 l 2 : S 6130/99 l6:l6 Classic 6/29/99 12:s 6/30/99 16:i 6 Classic 6129199 12:s 6130199 16:16 Ctassic 6/29/99 12:s 6130199 16:16 Ctassic 6/29/99 l2:W 6/30/99 l6:l6 Classic 6/29/99 1 2:54 6/30/90 i6:l6 Classic 7/21 199 14: 10 7/22/99 1 1 :17 Classic 7/21/99 W l O 1/22/99 1 1 317 Classic 7/21 199 14:j 0 7/22/99 1 1 :17 Classic 7121199 W l O 7122199 1 1 :17 Classic 7î21199 l4:q 0 7/22/99 11 :l? Classic 7121199 l4:lO TEZZI99 1 1 :17 Classic 7121199 14:iO 1/22/99 1 1 :17 Classic 7/21 199 t4:lO 7/22/99 1 1 :l 7

The Enhancement of Credir Cmd Frmd Deteetion Systems 130

APPENDIX A A Sarnple of Datasets

Table A-2 A srnall sample h m fraudulent file

Date Dollar Fraud Country SIC

5541 O

4814 5697 52t 1 521 1 521 1

O 4814 4814 561 1 5533 5541 5691 5971 561 1 5541 8220 5968 5541 4844 4814 5541 5541 4814 5541 4814 5541 701 1 4814 5944 7372 4814 5969 5655 5261 5699 5661 5541 5541 5994 4814 7832 5331 5499

POS

S K S S S S S K S S S K S S S S S K K S S S S S S S S S

s S S S K S S S S S S S S S S S

Card

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Ctassic Ctassic Classic Classic

me Enhancement of Credit Cmd Frmd Detectron Systems 131


Classic Classic Classic Classic Classic Cfassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classtc Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

The Enhancement ofcredit Cmd Fraid Detection Systtenis

APPENDlX A A Sample of Datasets

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Ciassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

me Enhancenient of Credit Card Fraud Detection Systems 133

APPENDIX B Output Summary of See5

Severai samples of the output s m q of' the classifiers, for learning sets with different

training class distributions and variations of different features, are presented in this

Appendix.

The summary shows the size of the trees and/or d e s , the number and percentage of emon

made in the classification, and a confusion matrix that demonstrates the distribution of mors

and pinpoints where those errors were made in the training and testing sets.

It shouid be noted that the See5 output indudes the details of individual trees anaor rules.

These details, for dl the classifien, were too long to be reported here (hundreds of pages).

Ine Enhancement of Credir Card Fraud Deiectiorz Systems 134

APPENDIX B Output Surnmary of See5

C ~ S S Distribution 50:50 (Al/ features are considered)


(a) (b) c-classified as

2972 391 (a) : class N 996 2360 (b) : class Y


Size

152

(a) - - - - 3798 279

Erxors

(b) C-classified as - - - - 1258 (a) : class N 1130 (b) : class Y


ne Enhancement of Credit Card Frauà Detection Systems 135

APPENOIX 6 Output Summary of See5

(a) ( b ) C-classif ied as

2903 460 (a) : class N 969 2387 (b) : class Y

hraluation on test data (6465 c a s e s ) :

(al (b) < -classif ied as - - - - - - - - 3740 1316 (a) : class N 265 1144 (b) : c l a s s Y

Options : Decimion tree Boost uring 10 trials

hraluation on training data (6719 cases) :

O 1 2 3 4 5 6 7

boos t

2938 425 (a) : class N 821 2535 (b) : class Y

The Enhancement of Credit Curd Frmd Detecti'on Systems 136


Evaluation on t e s t data (6465 cases) :

boost

(a) (b) C-classified as - - - - ---- 3806 1250 (a) : dass N

240 1161 (b) : class Y

Options : Dacirion trae Ganerating rular Boout umiag 10 trial8


boost


3026 3 3 7 (a) : class N 962 2394 (b) : class Y

R e Enhancement of Credit Card Fraud Detection Systems 137


Evaluation on tes t data (6465 cases) :

Trial -----

O 1 2 3 4 5 6 7

boost

Decision Tree - - - - - - - - - - - - - - - - S i z e Errors

Rules - - - - - - - - - - - - - - - -

No Errors

(a) (b) C-classified as - - - - - - - - 3958 1098 (a) : class N 276 1133 (b) : ciass Y

Options : Decf 6ioa tree Cross-validate (10 fold8)

The Enhancement of Credit Card F m d Detection Systems 138

(a) (b) c-classified as - - - - ---- 2855 508 (a) : class N il35 2221 (b) : class Y

Options : Decirion tree Oumrating ml08 Cross-validate (10 folds)

2855 508 (a) : class N 1119 2237 (b) : class Y


APPENDIX B Output Summary of Se85

C ~ S S Distribution 33:67 (Cad type is disregarded)

lsion îmes

maluation on training data (10059 cases) :

(a) (b) c-classified as - * - - - --- 6150 553 (a) : class N 1312 2044 (b) : class Y


4384 672 (a) : class N 397 1012 (b) : class Y


n e Enhancement of Credit Card Frmd Deteclion Systemr 140

APPENDIX 8 Output Summary of See5

(a) (b) c-classified as - - - - - - - - 6213 490 (a) : class N 1431 1925 (b) : class Y

Evaluation on test data (6465 caaes):

(a) (b) C-classified as - - - - - - - - 4461 595 (a) : class N 390 IO19 (b) : class Y

Options : Dacimion tree Boost uaing 5 trials


O 136 1865 (18.5%) 1 215 2382 (23 -7%) 2 184 2265 (22 - 5 % ) 3 173 2301 (22 - 9%) 4 171 2410(24-0%)

boos t 1676 (16.7%) c c

(a) (bl c-classified as

6530 173 (a) : class N 1503 1853 (b) : class Y

n>e Enhancement of Credit Card Fraud Detection Systems 141



(a) (b) C-classified as - - - - - - - - 4640 416 (a) : class N 442 967 (b) : class Y

Options : Decirion traa Qeaerating rule8 Boort using 10 trials


O 136 1865 (18 - 5 % ) 64 1921 (19.1%) 1 194 2318(23.0%) 82 2249(22.4%) 2 19s 2729 (27.1%) 5s 269s (z6.m) 3 243 2113 (21.0%) 84 1997(19.9%) 4 157 2548 (25.3%) 63 2 3 4 4 ( 2 3 . 3 % )

boost 1724 (17.1%) 1796(17.9%) cc

(a) (b) c-classified as - - - - ..--- 6552 151 (a) : class N 1645 1711 (b) : class Y


Ine Enhancement of Credit Cord Fraud Detection Systems 142


1 194 1484 (23 . O % ) 82 1381 (21.4%) 2 19s 2063 (31.9%) 50 1818 (28 -1%) 3 243 1534 (23.7%) 84 1207 (18.7%) 4 157 1678 (26 -0%) 63 1274 (19.7%)

boos t 1098 (17 .O%) 787 (12 - 2 % ) cc

4724 332 (a) : class N 455 954 (b) : class Y

Options : Decirion trae Crorr -validate (10 f olds)

(a) (b) c-classif ied as

6194 509 (a) : class N 1601 1755 (b) : class Y

Ine Enhancement @+redit Card Fmud Deiection Systems


Options : Deciiioa trme Gonrrating rule8 Cr088 -validate (10 folda)


6261 442 (a) : class N 1681 1675 (b) : class Y

The Enhancement of Credit Card Fraud Detecrion Systerns 144


Class Distribution 25:75 (POS & card type are disregarded)


9874 175 (a) : class N 1870 1486 (b) : class Y


(a) (b) <-classified as - - - - - - - - 4816 240 (a) : class N 522 887 (b) : clasa Y


The Enhoncement of Credit Cmd Fraud Detecrion Systeins 145


9908 1 4 1 ( a ) : class N 1975 1381 (b) : class Y


(a) (b) <-c lass i f i ed as - - - - - - - - 4855 201 (a) : class N 5 2 1 888 (b) : class Y

Options : Dacirion trrr Boomt u8iag 5 trial8

Evaluation on t r a in ing data (13405 cases) :

O 1 2 3 4

boos t

(a) (b) C-classified as _--_ ---- 9941 108 (a) : class N 1831 1525 (b): class Y

Tke Enhancement of Credit Card Fraud Detection Systms 146

APPENDIX 6 Output Summary of SeeS


O 127 762 (11 .8%) 1 325 1688(26,1%) 2 216 1090 (16.9%) 3 140 1544 (23 - 9 % ) 4 178 1195(18.5%)

boos t 7 7 3 (12.0%) <<

(a) (b) c-classified as ---- - * - -

4800 256 (a) : class N 517 892 (b) : class Y

Options : Decirion tram Oanatrtiag rule8 Boost uring 3 trials


O 127 2045(15,3%) 66 2116(15.8%) 1 350 2596(19.4%) 141 2567(19.1%) 2 226 2305 (17.2%) 83 2380 (17.8%)

boost 1895 (14.1%) 1955(14.6%) <<

(a) (b) c -classif ied as - - - - - - - - 9907 142 (a) : class N 1813 1543 (b) : class Y

Waluation on t e s t data (6465 cases):

The Enhancernent Neredit Card Fmud Deteetion System 147


1 350 1577 (24 -4%) 141 1484 (23.0%) 2 226 1128 (17 - 4%) 83 1207 (18 - 7 % )

boos t 890 (13 - 6 % ) 757 (11.7%) cc

(a) (b) C-classified a s - - - - - - - - 4793 263 (a) : class N 494 915 (b) : claçs Y

cross- validation

Options : Deci8ion tree Croam-validata (10 folda)

(a ) : class N (b) : class Y

The Enhancement of Credit Card F d Detection Systems 148


Options : Daci8i0n tram Oenarating rulem Crorr -validate (10 fo ldr )

Errors

(b) < -classi f ied as

232 (a) : class N 1334 (b) : class Y

Ine Enhancement of C'redit Card Fraud Detection Systems

APPENDIX C Summav of Results

The results of learning algorithm on training and testing datasets are sumxnarizeci in this

Appendix.

The Enhancement of Credit C d Frmd Detection Systems 150

APPENDW C Summary of Results

Summary of Results

Table C-1 Summary results of classifiers (Ail features ore considered)

Training (1 3,405) cases Testing (6,465) cases

Table C-2 Summary results of classifiers (Card me is disregarded)

Training (13,405) cases Testhg (6,465) cases

Classifier Option size - error err% nile

7 1 Boost & Rules - 1706 12.7 -

sue err%

-

13.7

- 12.5

Classifier Option

Decision Trces

Tnes&Rules

BoostcdTms

Boost & Rules

err% sizt error

error I

size

193

193

-

-

193

193

-

-

mie

- 929

929

888

1183

error

-

enor

1816

1816

1527

1524

en'%

13.5

13.5

11.4

11.4

81

- -

1840

- 1681

APPENDIX C Summary of Results

Table C-3 Summary results of classifiers (POS & card type are disregarded)

Training (1 3,405) cases

Classifier Option size error en?! d e error

Testing (6,465) cases r

err % size enor e n % d e s error &/O

lne Enhancement of Credit Cwd Fraud Detection Systenrs 152

APPENDlX C Summary of Results

Disaibution of 33:67

Table C-4 Summary results of classi fiers (Allfeutures are considered)

Classifier Option F Boostcci Trces I

Training (10,059) cases

Table C-5 Summary results of classifiers (Card type is disregarded)

Testing (6,465) cases

Classifier Option F - size

- 136 - 136 - - - - -

errO/o

- 13.6

- 12.7

error

- 1777

- 1724

size

166

166

-

-

Training (10,059) cases

d e c m r err% size error en% des cmr crr?!

error

- 882

- 823

err %

- 17.7

- 17-1

size

166

166

- -

en?!

17.3

17.3

15.2

15.0

error

1739

1739

1524

1508


Tne Enhuncement of Credit Cmd Fraud Detection Systems 153

rule

- 60

- -

error

933

933

1032

1165

err%

14.4

14.4

16.0

18.0

d e s

- 60

- -

APPENDIX C Surnmary of Results

Table Cd Summary results of classifiers (POS & card rype are disregarded)

Training (10,059) cases Testing (6,465) cases

3ne Enhancement of Credit Card Frmd Detection Systems 154

ClmcrOption

Dccision Trcts

Tr= Bt Rulcs

Boosted Trces

sizc

106

106

-

err??

19.2

19.2

17.2

en??

-

13.9

-

error

1933

1933

1728

Baost & Rules 16.1 11.9

size

106

106

-

err%

15.5

15.5

13.8

error

1004

1004

893

- - 14.7

err%

-

20.2

- 952

mle

- 42

- 1624

rules

- 42

-

18.8

e m r

- 2031

-

-

emr

- 899

-

- 1895 771

APPENDM C Summary of Results

Table C-7 Summary results of classifiers (Ail feutiures are conridered)

ClassifierOption size m o t en% d e enor

r

Boost & Rulcs 1170 17.4

I J

e n % size error err % rules errot en!!!

Training (6,7 1 9) cases

Table C-8 Summary results of classi fiers (Card type is disregarded)


Training (471 9) cases Testing (6,465) cases

C l d c r Option

k is ion T m s

Tms&Rulcs

Bwsted Tnts

Boost & Rulcs

size

116

m r

1507

size

116

116

-

-

crror

- 1560

- 1444

edh

22.4

d!

- 24.1

- 22.3

crror

1619

1619

1587

1672

116 ' 22.4 1507

en%

25.0

25.0

24.5

25.9

err%

46

nile niles

-

46

-

-

-

-

m o t

-

-

-

1521

1325

1454

22.6

-

1455

19.7

21.6

- 21.7

- -

APPENDE C Sumrnary of Results

Table C-9 Slltnmary results of classifiers (POS & card type are disregarded)

Training (6,719) cases Testing (6,465) cases

î7te Enhancement of Credit Curd Fraud Detection Systems 156

Classifier Option

Decision T m s

T m & Rulcs

Boostcd Trces

Boost & RuIes

size

99

99

-

-

error

1580

1580

1477

1451

en %

27.5

27.5

25.9

24.1

size

99

99

en?!

- 28.8

-

252

d e s

-

47

-

-

error

1777

1777

err %

-

emr

- 1862

-

1627

en?!

23.5

I

23.5

22.0

21.6

d e

-

-

-

error

-

1675

1559

47

-

-

1626

- 1647

24.2

- 24.5

APPENDIX D Classifer Prediction

The class of new cases predicted by the boosted decision trees (BDT) trauled on 25:75 and

5050 class distributions are shown in this Appendix.

Section D. 1 presents the prediction results on the classification of the examined new cases

using the BDT trained on 25:75 huMegitimate data distribution while ail features were

considered. Tables D-1 and D-2 demonstrate the class predictions for the legitimate and

fiaudulent case sets, respectively. These Tables illusirate different fields in the transactions,

clam predicteà by the BDT classifier, a probability value for this prediction, and the correct

classes.

Section D.2 presents the prediction results on the classification of the examined new cases

using the BDT trained on 5050 class distriauton. Tables D-3 and D-4 present the class

predictions for the legitimate and fiaudulent case sets, respectively.

The Enhancement of Credit Cwd Fruzià Detection System 157

APPENDIX D Classifier Prediction

D.l Making Prediction using Boosted Decision Trees [ C ~ ~ S S distriiution 25:75)

Tabk ID-1 Prediction of new cases using boosted decision trees classifier (Leaitirnate Acco-

Account Dollar Country SIC Decision POS Card Prediction Probability Correct dass

Gold Gold Gold

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Ctassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Ctassic

me Enhmcement of Credit Card F r d Detection Systemr 158

APPENDlX D Classifier Predicüon

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic C lassic Cfassic C tassic Classic Classic Classic Ctassic Classic Classic Classic Classic Classic Classk Classk Classic

The Enhancement of Credir Card Fmud Detectiun Systems

APPENDiX D Classifier Prediction

Table D-2 Prediction of new cases using boosted decision trees classifier m-u

Acoount Dollar Country SIC Oecision POS Card Prediction Pmbability Cortectclass

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classc Classic Classic Classic Classic Classic Classic Classic

me Enhancement of Credit Càrd Fraud Detection Systems 160

APPENDIX D Classifier Predicüon

S Classic S Classic S Classic S Classic S Classic ? Classic S Classic S Classic S Classic S Classic K Classic S Classic K Classic K Classic K Classic S Classic K Classic S Classic S Classic S Classic S Classic S Classic S Classic S Classic S Cfassic S Classic S Classic K Classic S Classic S Classic S Classic K Classic ? Ctassic K Ctassic K Classic K Classic K Classic S Classic S Classic S Classic S Classic S Classic S Classic S Classic

R e Enhancement of Crediî Card F r d Detecrion Systems 161

APPENDIX D Classifier Predict'in

D.2 Making Predictioa using Boosted Decision Trees (Class distribution 5050)

Table D-3 Prediction of new cases using boosted decision trees classifier m a t e

Account Dollar Country SIC Decision POS Card Predicüon Pmbability Correct class

Gold Gold Gold Gold Gold

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

The Enhancement of Credit Cmd Fraud Detection Systenrr


Classic Cfassic Classic Classic CIassic Classic Ctassic Classic Ctassic CIassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classc Classic Classic Classic Classic Classc Classc Classic Classic Classic Classic Classic

Ine Enhancement of Credit Curd Fmud Detecrion Sysrmis

Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

me Enhancement of Credit Cwd Fmud Detection Systems

APPENDIX O Classifier Prediction

Table D-4 Prediction of new cases using boosted decision ûees classifier fFrpudulcnt Accounts)

Account Dollar Country SIC Decision POS Card Prediction Probability Correct class

610 610 810 610 810 81 O 81 1 811 81 1 61 1 BI2 812 BI2 812 812 813 813 813 813 814 834 814 834 81 5 81 5 81 5 61 5 615 815 815 816 616 61 7 817 817 817 B l f 818 818 618 818 818 B18

Classic Classic Classic Classic Ciassic Classic Ciassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

îRe Enhancement of Credit Card Fraud Detection Systems 165

APPENDiX D Classifier Prediction

818 618 81 8 818 61 8 61 8 81 8 B18 819 81 9 019 820 820 820 820 B21 821 821 622 622 623 623 623 623 623 624 624 824 624 B24 624 825 825 625 625 626 626 826 626 826 826 826 826 826 BZ? B2? 828 828 828

Classic Classic Classic Classic Cfassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Ciassic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

ne Enhuncement of Credit Cmd Fruud Detection S y ~ m s


Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic Classic

me Enhuncement of Ctedit Cwd Fraud Detection Systems 167

the enhancemeat of fraud detectioa systems using machine ...€¦ · the enhancement of credit card...

Documents