+ adaptive fraud detection by tom fawcett and foster provost tom fawcett foster provosttom fawcett...

48
+ Adaptive Fraud Detection Adaptive Fraud Detection by by Tom Fawcett and and Foster Provost Presented by: David Sander Presented by: David Sander

Upload: avis-scott

Post on 31-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

+

Adaptive Fraud DetectionAdaptive Fraud Detection

by by Tom Fawcett and and Foster Provost

Presented by: David SanderPresented by: David Sander

+OutlineOutline

Problem DescriptionProblem Description Cellular cloning fraud problemCellular cloning fraud problem Why it is importantWhy it is important Current strategiesCurrent strategies

Construction of Fraud DetectorConstruction of Fraud Detector FrameworkFramework Rule learning, Monitor construction, Evidence combinationRule learning, Monitor construction, Evidence combination

Experiments and EvaluationExperiments and Evaluation Data used in this studyData used in this study Data preprocessingData preprocessing Comparative resultsComparative results

ConclusionConclusion

Exam QuestionsExam Questions

2

+The ProblemThe Problem

How to detect suspicious changes in user behavior to identify and prevent cellular fraud Non-legitimate users, aka bandits, gain illicit access to a

legitimate user’s, or victim’s, account

Solution useful in other contexts Identifying and preventing credit card fraud, toll fraud, and

computer intrusion

3

+Cellular Fraud - CloningCellular Fraud - Cloning

Cloning FraudCloning Fraud A kind of A kind of Superimposition Superimposition fraud (parasite)fraud (parasite) Fraudulent usage is superimposed upon ( added to ) the Fraudulent usage is superimposed upon ( added to ) the

legitimate usage of an accountlegitimate usage of an account Causes inconvenience to customers and great expense to Causes inconvenience to customers and great expense to

cellular service providerscellular service providers

4

+Cellular communications andCellular communications andCloning FraudCloning Fraud

Mobile Identification Number Mobile Identification Number (MIN) and (MIN) and Electronic Serial Number Electronic Serial Number (ESN)(ESN) Identify a specific accountIdentify a specific account Periodically transmitted unencrypted whenever phone is onPeriodically transmitted unencrypted whenever phone is on

Bandits use MIN and ESN to fake a customer’s Bandits use MIN and ESN to fake a customer’s accountaccount Bandit can make virtually unlimited, untraceable calls at Bandit can make virtually unlimited, untraceable calls at

someone else’s expensesomeone else’s expense

5

+ Interest in reducing Cloning Interest in reducing Cloning FraudFraud Fraud is detrimental in several ways:Fraud is detrimental in several ways:

Fraudulent usage congests cell sitesFraudulent usage congests cell sites Fraud incurs land-line usage chargesFraud incurs land-line usage charges Crediting process is costly to carrier and inconvenient to the Crediting process is costly to carrier and inconvenient to the

customercustomer

6

+Strategies for dealing Strategies for dealing with cloning fraudwith cloning fraud

Pre-call MethodsPre-call Methods Identify and block fraudulent calls as they are madeIdentify and block fraudulent calls as they are made Validate the phone or its user when a call is placedValidate the phone or its user when a call is placed

Post-call MethodsPost-call Methods Identify fraud that has already occurred on an account so Identify fraud that has already occurred on an account so

that further fraudulent usage can be blockedthat further fraudulent usage can be blocked Periodically analyze call data on each account to determine Periodically analyze call data on each account to determine

whether fraud has occurred.whether fraud has occurred.

7

+Pre-call MethodsPre-call Methods

Personal Identification Number (PIN)Personal Identification Number (PIN) PIN cracking is possible with more sophisticated equipmentPIN cracking is possible with more sophisticated equipment

RF Fingerprinting RF Fingerprinting Method of identifying phones by their unique transmission Method of identifying phones by their unique transmission

characteristicscharacteristics

AuthenticationAuthentication Reliable and secure private key encryption methodReliable and secure private key encryption method Requires special hardware capability Requires special hardware capability An estimated 30 million non-authenticatable phones are in An estimated 30 million non-authenticatable phones are in

use in the US alone (in 1997)use in the US alone (in 1997)

8

+Post-call MethodsPost-call Methods

Collision DetectionCollision Detection Analyze call data for temporally overlapping callsAnalyze call data for temporally overlapping calls

Velocity CheckingVelocity Checking Analyze the locations and times of consecutive callsAnalyze the locations and times of consecutive calls

User ProfilingUser Profiling

9

+Another Post-call MethodAnother Post-call Method( Main focus of this paper )( Main focus of this paper )

User Profiling User Profiling Analyze calling behavior to detect usage anomalies Analyze calling behavior to detect usage anomalies

suggestive of fraudsuggestive of fraud Works well with low-usage customersWorks well with low-usage customers Good complement to collision and velocity checking Good complement to collision and velocity checking

because it covers cases the others might missbecause it covers cases the others might miss

10

Sample Frauded AccountSample Frauded Account

Date Time Day Duration Origin Destination Fraud1/01/95 10:05:01 Mon 13 minutes Brooklyn, NY Stamford, CT

1/05/95 14:53:27 Fri 5 minutes Brooklyn, NY Greenwich, CT

1/08/95 09:42:01 Mon 3 minutes Bronx, NY Manhattan, NY

1/08/95 15:01:24 Mon 9 minutes Brooklyn, NY Brooklyn, NY

1/09/95 15:06:09 Tue 5 minutes Manhattan, NY Stamford, CT

1/09/95 16:28:50 Tue 53 seconds Brooklyn, NY Brooklyn, NY

1/10/95 01:45:36 Wed 35 seconds Boston, MA Chelsea, MA Bandit

1/10/95 01:46:29 Wed 34 seconds Boston, MA Yonkers, NY Bandit

1/10/95 01:50:54 Wed 39 seconds Boston, MA Chelsea, MA Bandit

1/10/95 11:23:28 Wed 24 seconds Brooklyn, NY Congers, NY

1/11/95 22:00:28 Thu 37 seconds Boston, MA Boston, MA Bandit

1/11/95 22:04:01 Thu 37 seconds Boston, MA Boston, MA Bandit

11

+The Need to be AdaptiveThe Need to be Adaptive

Patterns of fraud are dynamic – bandits constantly Patterns of fraud are dynamic – bandits constantly change their strategies in response to new detection change their strategies in response to new detection techniquestechniques

Levels of fraud can change dramatically from month-to-Levels of fraud can change dramatically from month-to-monthmonth

Cost of missing fraud or dealing with false alarms Cost of missing fraud or dealing with false alarms change with inter-carrier contractschange with inter-carrier contracts

12

+

Automatic Construction of Profiling Fraud Automatic Construction of Profiling Fraud DetectorsDetectors

+One ApproachOne Approach

Build a fraud detection system by classifying calls as Build a fraud detection system by classifying calls as being fraudulent or legitimatebeing fraudulent or legitimate

However there are two problems that make simple However there are two problems that make simple classification techniques infeasible.classification techniques infeasible.

14

+Problems with simple Problems with simple classificationclassification ContextContext

A call that would be unusual for one customer may be typical A call that would be unusual for one customer may be typical for another customerfor another customer

Granularity (over fitting?)Granularity (over fitting?) At the level of the individual call, the variation in calling At the level of the individual call, the variation in calling

behavior is large, even for a particular userbehavior is large, even for a particular user

15

+In Summary: In Summary: Learning The ProblemLearning The Problem

1) Which phone call features are important?1) Which phone call features are important?

2) How should profiles be created?2) How should profiles be created?

3) When should alarms be raised?3) When should alarms be raised?

16

+ Proposed Detector Constructor Proposed Detector Constructor Framework (DC-1)Framework (DC-1)

17

+DC-1 Processing Account-Day DC-1 Processing Account-Day ExampleExample

18

+DC-1 Fraud Detection StagesDC-1 Fraud Detection Stages

Stage 1: Rule LearningStage 1: Rule Learning

Stage 2: Profile MonitoringStage 2: Profile Monitoring

Stage 3: Combining EvidenceStage 3: Combining Evidence

19

+Rule Learning – the 1Rule Learning – the 1stst stage stage

Rule GenerationRule Generation Rules are generated locally based on differences Rules are generated locally based on differences

between fraudulent and normal behavior for each between fraudulent and normal behavior for each accountaccount

Rule Selection Rule Selection Then they are combined in a rule selection stepThen they are combined in a rule selection step

20

+Rule GenerationRule Generation

DC-1 uses the DC-1 uses the RLRL program to generate rules program to generate rules with certainty factors above user-defined with certainty factors above user-defined thresholdthreshold

For each Account, RL generates a For each Account, RL generates a ““locallocal”” set set of rules describing the fraud on that of rules describing the fraud on that account. account.

Example:Example:

(Time-of-Day = Night) AND (Location = Bronx) (Time-of-Day = Night) AND (Location = Bronx) FRAUD FRAUD

Certainty Factor = 0.89Certainty Factor = 0.89

21

+Rule SelectionRule Selection

Rule Rule generation step typically yields tens of generation step typically yields tens of thousands of rulesthousands of rules

If a rule is found in ( or covers ) many accounts then If a rule is found in ( or covers ) many accounts then it is probably worth usingit is probably worth using

Selection algorithm identifies a small set of general Selection algorithm identifies a small set of general rules that cover the accountsrules that cover the accounts

Resulting set of rules is used to construct specific Resulting set of rules is used to construct specific monitorsmonitors

22

+Profiling Monitors – the 2Profiling Monitors – the 2ndnd stagestage

Monitors have 2 distinct steps -Monitors have 2 distinct steps - Profiling step:Profiling step:

Monitor is applied to an account’s normal usage to measure Monitor is applied to an account’s normal usage to measure the accountthe account‘‘s normal activitys normal activity

Statistics are saved with the account.Statistics are saved with the account.

Use step:Use step: A monitor processes a single account-dayA monitor processes a single account-day References the normalcy measure from profilingReferences the normalcy measure from profiling Generates a numeric value describing how abnormal the Generates a numeric value describing how abnormal the

current account-day iscurrent account-day is

23

+Most Common Monitor Most Common Monitor TemplatesTemplates

ThresholdThreshold

Standard DeviationStandard Deviation

24

+Threshold MonitorsThreshold Monitors

25

+Standard Deviation MonitorsStandard Deviation Monitors

26

+Comparing the same standard deviation monitor on two accounts

27

+Example for Standard Example for Standard DeviationDeviation

Rule Rule (TIME OF DAY = NIGHT) AND (LOCATION = BRONX)(TIME OF DAY = NIGHT) AND (LOCATION = BRONX) FRAUD FRAUD

Profiling StepProfiling Step the subscriber called from the Bronx an average of the subscriber called from the Bronx an average of 55 minutes minutes

per night with a standard deviation of per night with a standard deviation of 22 minutes. At the end of minutes. At the end of the Profiling step, the monitor would store the values (5,2) with the Profiling step, the monitor would store the values (5,2) with that account.that account.

Use stepUse step if the monitor processed a day containing if the monitor processed a day containing 33 minutes of airtime minutes of airtime

from the Bronx at night, the monitor would emit a zero; if the from the Bronx at night, the monitor would emit a zero; if the monitor saw monitor saw 1515 minutes, it would emit (15 - 5)/2 = 5. This value minutes, it would emit (15 - 5)/2 = 5. This value denotes that the account is five standard deviations above its denotes that the account is five standard deviations above its average (profiled) usage levelaverage (profiled) usage level

28

+ Combining Evidence from Combining Evidence from the Monitors – the 3the Monitors – the 3rdrd stage stage Weights the monitor outputs and learns a Weights the monitor outputs and learns a

threshold on the sum to produce high threshold on the sum to produce high confidence alarmsconfidence alarms

DC-1 uses Linear Threshold Unit (LTU)DC-1 uses Linear Threshold Unit (LTU) Simple and fastSimple and fast Enables good first-order judgmentEnables good first-order judgment

A Feature selection process is used toA Feature selection process is used to Choose a small set of useful monitors in the final detectorChoose a small set of useful monitors in the final detector Some rules don’t perform well when used in monitors, some Some rules don’t perform well when used in monitors, some

overlapoverlap Forward selection process chooses set of useful monitorsForward selection process chooses set of useful monitors

29

+Final Output of DC-1

Detector that profiles each user’s behavior based on several indicators

An alarm when sufficient evidence of fraudulent activity

30

+

Data used in the studyData used in the study

+ Data InformationData Information

Four months of phone call records from the Four months of phone call records from the New York City areaNew York City area

Each call is described by 31 original attributesEach call is described by 31 original attributes

Some derived attributes are addedSome derived attributes are added Time-Of-Day Time-Of-Day (MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)(MORNING, AFTERNOON, TWILIGHT, EVENING, NIGHT)

To-PayphoneTo-Payphone

Calls labeled as fraudulent using block Calls labeled as fraudulent using block creditingcrediting

32

+Data CleaningData Cleaning

Eliminated calls that were credited outside Eliminated calls that were credited outside of the range of fraudulent call timesof the range of fraudulent call times

Days with 1-4 minutes of fraudulent usage Days with 1-4 minutes of fraudulent usage were discarded.were discarded. May have credited for other reasons, such as wrong numberMay have credited for other reasons, such as wrong number

Call times were normalized to Greenwich Call times were normalized to Greenwich Mean Time for chronological sortingMean Time for chronological sorting

33

+Data DescriptionData Description

After monitor creation, data is separated into After monitor creation, data is separated into “Account Days”“Account Days”

Selected for Profiling, training and testing:Selected for Profiling, training and testing: 3600 accounts that have at least 30 fraud-free days of 3600 accounts that have at least 30 fraud-free days of

usage before any fraudulent usageusage before any fraudulent usage Initial 30 days of each account were used for profilingInitial 30 days of each account were used for profiling Remaining days were used to generate 96,000 account-Remaining days were used to generate 96,000 account-

daysdays Distinct training and testing accounts:10,000 account-days Distinct training and testing accounts:10,000 account-days

for training; 5000 for testingfor training; 5000 for testing 20% fraud days and 80% non-fraud days20% fraud days and 80% non-fraud days

34

+

Experiments and EvaluationExperiments and Evaluation

+Output of DC-1 componentsOutput of DC-1 components

Rule learning: 3630 rulesRule learning: 3630 rules Each covering at least two accountsEach covering at least two accounts

Rule selection: 99 rulesRule selection: 99 rules

2 monitor templates yielding 198 2 monitor templates yielding 198 monitorsmonitors

Final feature selection: 11 monitorsFinal feature selection: 11 monitors

36

+The Importance Of Error CostThe Importance Of Error Cost

Classification accuracy is not sufficient to Classification accuracy is not sufficient to evaluate performanceevaluate performance

The costs of misclassification should be The costs of misclassification should be factored infactored in

Estimated Error Costs:Estimated Error Costs: False positive(false alarm): $5False positive(false alarm): $5 False negative (letting a fraudulent account-day go False negative (letting a fraudulent account-day go

undetected): $0.40 per minute of fraudulent air-timeundetected): $0.40 per minute of fraudulent air-time

Factoring in error costs requires second Factoring in error costs requires second training pass by LTU (Linear Threshold Unit)training pass by LTU (Linear Threshold Unit)

37

+Alternative Detection MethodsAlternative Detection Methods

Collisions + VelocitiesCollisions + Velocities Errors almost entirely due to false negativesErrors almost entirely due to false negatives

High Usage – detect sudden large jump in High Usage – detect sudden large jump in account usageaccount usage

Best Individual DC-1 MonitorBest Individual DC-1 Monitor (Time-of-day = Evening) ==> Fraud(Time-of-day = Evening) ==> Fraud

SOTA - State Of The ArtSOTA - State Of The Art Incorporates 13 hand-crafted profiling methodsIncorporates 13 hand-crafted profiling methods Best detectors identified in a previous studyBest detectors identified in a previous study

38

DC-1 Vs. AlternativesDC-1 Vs. Alternatives

Detector Accuracy(%) Cost ($) Accuracy at Cost

Alarm on all 20 20000 20

Alarm on none 80 18111 +/- 961 80

Collisions + Velocities

82 +/- 0.3 17578 +/- 749 82 +/- 0.4

High Usage 88+/- 0.7 6938 +/- 470 85 +/- 1.7

Best DC-1 monitor 89 +/- 0.5 7940 +/- 313 85 +/- 0.8

State of the art (SOTA)

90 +/- 0.4 6557 +/- 541 88 +/- 0.9

DC-1 detector 92 +/- 0.5 5403 +/- 507 91 +/- 0.8

SOTA plus DC-1 92 +/- 0.4 5078 +/- 319 91 +/- 0.8

39

+Shifting Fraud DistributionsShifting Fraud Distributions

Fraud detection system should adapt to Fraud detection system should adapt to shifting fraud distributionsshifting fraud distributions

To illustrate the above point - To illustrate the above point - One non-adaptive DC-1 detector trained on a One non-adaptive DC-1 detector trained on a

fixed distribution ( 80% non-fraud ) and fixed distribution ( 80% non-fraud ) and tested against range of 75-99% non-fraudtested against range of 75-99% non-fraud

Another DC-1 was allowed to adapt (re-train Another DC-1 was allowed to adapt (re-train its LTU threshold) for each fraud distributionits LTU threshold) for each fraud distribution

Second detector was more cost effective Second detector was more cost effective than the firstthan the first

40

41

Effects of Changing Fraud Distribution

0

0.2

0.4

0.60.8

1

1.2

1.4

75 80 85 90 95 100Percentage of non-fraud

Cost

Adaptive

80/20

+ConclusionConclusion

DC-1 uses a rule learning program DC-1 uses a rule learning program to uncover indicators of fraudulent to uncover indicators of fraudulent behavior from a large database of behavior from a large database of customer transactionscustomer transactions

Then the indicators are used to Then the indicators are used to create a set of monitors, which create a set of monitors, which profile legitimate customer profile legitimate customer behavior and indicate anomalies behavior and indicate anomalies

Finally, the outputs of the monitors Finally, the outputs of the monitors are used as features in a system are used as features in a system that learns to combine evidence to that learns to combine evidence to generate high confidence alarms generate high confidence alarms

44

+ConclusionConclusion

Adaptability to dynamic patterns of fraud Adaptability to dynamic patterns of fraud can be achieved by generating fraud can be achieved by generating fraud detection systems automatically from detection systems automatically from data, using data mining techniquesdata, using data mining techniques

DC-1 can adapt to the changing conditions DC-1 can adapt to the changing conditions typical of fraud detection environmentstypical of fraud detection environments

Experiments indicate that DC-1 performs Experiments indicate that DC-1 performs better than other methods for detecting better than other methods for detecting fraudfraud

45

+

Exam QuestionsExam Questions

46

+Question 1 Question 1

• What are the two major fraud detection categories, What are the two major fraud detection categories, differentiate them, and where does DC-1 fall under?differentiate them, and where does DC-1 fall under?

• Pre Call MethodsPre Call Methods

• Involves validating the phone or its user when a call is placedInvolves validating the phone or its user when a call is placed

• Post Call Methods – DC1 falls herePost Call Methods – DC1 falls here

• Analyzes call data on each account to determine whether cloning Analyzes call data on each account to determine whether cloning fraud has occurredfraud has occurred

47

+Question 2Question 2

• Why do fraud detection methods need to be adaptive?Why do fraud detection methods need to be adaptive?

• Bandits change their behavior- patterns of fraud dynamicBandits change their behavior- patterns of fraud dynamic

• Levels of fraud varies month-to-monthLevels of fraud varies month-to-month

• Cost of missing fraud or handling false alarms changes between Cost of missing fraud or handling false alarms changes between inter-carrier contractsinter-carrier contracts

48

+Question 3Question 3

•What are the two steps of profiling What are the two steps of profiling monitors and and what are the two main monitors and and what are the two main monitor templates?monitor templates?

•Profiling Step: measure an accounts normal activity Profiling Step: measure an accounts normal activity and save statisticsand save statistics

•Use Step: process usage for an account-day to Use Step: process usage for an account-day to produce a numerical output describing how abnormal produce a numerical output describing how abnormal activity was on that account-dayactivity was on that account-day

• Threshold and Standard Deviation monitorsThreshold and Standard Deviation monitors

49

+

QuestionsQuestions??

50