fraud detection and prevention: leveraging machine ... · fraud: areas and types of fraud •...

Post on 21-Jun-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

#1 Agile Predictive Analytics Platform for Today’s Modern Analysts

RapidMiner Wisdom 2018 – New Orleans, LA, USA, October 12th, 2018

Ralf Klinkenberg, Founder & Head of Data Science Research, RapidMiner

rklinkenberg@rapidminer.com

www.RapidMiner.com

Fraud Detection and Prevention: Leveraging Machine Learning to Detect Fraud Patterns, Anomalies, and Unusual Behaviors

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 2 -

Creating Value from Big Data

Fraud – Areas & Types & Relevance

Machine Learning for Fraud Detection & Prevention

Credit Card Fraud Detection & Prevention

1.

2.

3.

4. Healthcare Fraud Detection & Prevention

- 3 -©2016 RapidMiner, Inc. All rights reserved.

Fraud

©2016 RapidMiner, Inc. All rights reserved.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 4 -

Fraud: Areas and Types of Fraud

• Credit Card Fraud

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 5 -

Fraud: Areas and Types of Fraud

• Credit Card Fraud

• Tax Fraud

– EU: Value Added Tax (VAT) Fraud in Transactions withinNetworks of Companies

– Income Tax Fraud / Corporate Tax Fraud

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 6 -

Fraud: Areas and Types of Fraud

• Credit Card Fraud

• Tax Fraud

– EU: Value Added Tax (VAT) Fraud in Transactions withinNetworks of Companies

– Income Tax Fraud / Corporate Tax Fraud

• Fraud in Supply Chains, Retail Networks, Purchase Departments, Procurement

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 7 -

Fraud: Areas and Types of Fraud

• Credit Card Fraud

• Tax Fraud– EU: Value Added Tax (VAT) Fraud in Transactions within

Networks of Companies– Income Tax Fraud / Corporate Tax Fraud

• Fraud in Supply Chains, Retail Networks, Purchase Departments, Procurement

• Insurance Fraud:– Car Insurance (Faked Accidents)

– Fire Insurance

– Healthcare Insurance

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 8 -

Fraud: Healthcare Insurance Fraud

• Example: Medicaid/Medicare in the USA: 1 US State alone: 6 billion US$ budget per year => estimated 10-20% fraud & waste=> 1 billion US$ per year lost

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 9 -

Fraud: Healthcare Insurance Fraud

• Example: Medicaid/Medicare in the USA: 1 US State alone: 6 billion US$ budget per year => estimated 10-20% fraud & waste=> 1 billion US$ per year lost

• Fraudulent Patients (e.g. Drug Addicts/Dealers/Resellers)

• Fraudulent Doctors

• Fraudulant Pharmacies / Hospitals / Service Providers / Suppliers

• Individuals as well as Networks of Fraudsters

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 10 -

Fraud: Challenges for Fraud Detection

• Large Number of Potential Types and Areas of Fraud

• Intelligent and Constantly Improving Adversaries

• Changing Fraud Patterns and Types

• Large Amounts of Potentially Relevant Data

• Large Variety of Potentially Relevant Data Sources & Types– Structured and Unstructured Data: Transactions, Time Series Data,

Textual Data, Network Data, Entity Relations, etc.

• Limited Resources for Fraud Detection & Prevention– Which cases to investigate (first / at all)?

– Prioritize & focus to maximize effectiveness & efficiency

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 11 -

Fraud: Known vs. Unknown Types of Fraud

• New instances of known types of fraud should beautomatically identified

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 12 -

Fraud: Known vs. Unknown Types of Fraud

• New instances of known types of fraud should beautomatically identified:

=> use Machine Learning to automatically find patterns(in data from the past with known fraud cases)

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 13 -

Fraud: Known vs. Unknown Types of Fraud

• New instances of known types of fraud should beautomatically identified:

=> use Machine Learning to automatically find patterns

=> deploy generated models to automatically identify new cases

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 14 -

Fraud: Known vs. Unknown Types of Fraud

• New instances of known types of fraud should beautomatically identified:

=> use Machine Learning to automatically find patterns

=> deploy generated models to automatically identify new cases

• But what about new types of fraud?

- 15 -©2016 RapidMiner, Inc. All rights reserved.

Machine Learning forFraud Detection

©2016 RapidMiner, Inc. All rights reserved.

- 16 -©2016 RapidMiner, Inc. All rights reserved.

Predictive Analytics Transforms Insight into ACTION

Descriptive

Diagnostic

Predictive

Prescriptive

OBSERVEWhat happened

EXPLAINWhy did it happen

ANTICIPATEWhat will happen

ACTOperationalize

Value

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 17 -

Metrics & Indicators for Fraud Risk

• Domain experts often know metrics that may be indicative of a high risk of fraud

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 18 -

Metrics & Indicators for Fraud Risk

• Domain experts often know metrics that may be indicative of a high risk of fraud => incorporate into entity features

• Examples:

– Entity = Patient:

▪ Total Payments Received,

▪ Number of Prescriptions,

▪ Number of Doctors Visited,

▪ Number of Pills per Month, etc.

– Entity = Prescriber (e.g. Doctor):

▪ Total Payments Received, Number of Patients per Month, Amount per Patient, etc.

– Entity = Service Provider (e.g. Pharmacy, Hospital, etc.):

▪ Total Payments Received, Price per Unit, Price per Treatment of Type X, etc.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 19 -

Comparison to Peer Groups

• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 20 -

Comparison to Peer Groups

• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 21 -

Comparison to Peer Groups

• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No, but a high total amount prescribed my indicate ahigh risk of fraud.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 22 -

Comparison to Peer Groups

• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No, but a high total amount prescribed my indicate a high risk of fraud.

• Oncologists often need to prescribe expensive anti-cancerdrugs=> oncologists may have higher „Total Amounts Prescribed“

than other types of doctors (specializations)=> compare a doctor‘s metric to the average value of his/her

peers (and not to the average for all doctors) => ratio.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 23 -

Leverage Fraud Risk Indicators

• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No, but a high total amount received my indicate a high risk offraud.

• => Rank entities by value of key metrics => suspects

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 24 -

Combined Fraud Risk Indicators

• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No, but a high total amount received my indicate a high risk offraud.

• => Rank entities by value of key metrics => suspects

• => Combine metrics (e.g. weighted sum): Fraud Risk Score=> Rank entities by value of combined metric => suspects

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 25 -

Leverage Fraud Risk Indicators

• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?

• No, but a high total amount received my indicate a high risk offraud.

• => Rank entities by value of key metrics => suspects

• => Combine metrics (e.g. weighted sum): Fraud Risk Score=> Rank entities by value of combined metric => suspects

• No machine learning yet, but an often used initial solution torank and prioritize entities for review / audits / investigation

• => more effective & efficient use of resources (auditors)

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 26 -

Classification

Algorithms to predict classes(Fraud / No Fraud)

Grouping

Group similar items together(Segmentation, Clustering, Item Sets,Association Rules, Sequence Analysis,

Network Analysis)

Anomaly Detection

Find outliers in your data(unusual behaviors)

Regression

Algorithms to predict numbers(Fraud Risk Scores or Expected Values)

Automation

Optimization

Deployment

Feature Extraction

&Selection

Unsupervised Learning

Supervised Learning

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 27 -

Machine Learning: Supervised vs. Unsupervised

• Supervised Machine Learning:– Data from the past with known fraud and non-fraud cases (label);

– Machine Learning of Classification models or Association rules to find fraud patterns from the past and to automatically identify newinstances of these fraud types in new data;

– Applicable to known fraud cases, patterns, and types.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 28 -

Machine Learning: Supervised vs. Unsupervised

• Supervised Machine Learning:– Data from the past with known fraud and non-fraud cases (label);– Machine Learning of Classification models or Association rules to find

fraud patterns from the past and to automatically identify new instancesof these fraud types in new data;

– Applicable to known fraud cases, patterns, and types.

• Unsupervised Machine Learning:– Clustering (Segmentation): Grouping entities into clusters of similar

entities (patients, doctors, service providers, etc.);– Anomaly Detection / Outlier Detection: detect unusual behaviors;– Both depend on selected attributes, normalization and/or weighting;– Attribute Weighting can be used to incorporate domain knowledge and/or

priorities;– Allows to find previously unknown types of fraud.

- 29 -©2016 RapidMiner, Inc. All rights reserved.

Fraud Detection and Prediction

©2016 RapidMiner, Inc. All rights reserved.

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 30 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to

Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 31 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to

Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning:

– Automated Classification

– Risk Score Regression

– Association Rule Generation

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 32 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to

Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation

• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning: Anomaly Detection, Outlier Detection

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 33 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation

• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning: Anomaly Detection, Outlier Detection

• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 34 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation

• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning, Anomaly Detection, Outlier Detection

• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications

• Step 5: Adversial Machine Learning / Text Analytics / Process Mining

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 35 -

Fraud Detection with Machine Learning

• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection

• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation

• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning, Anomaly Detection, Outlier Detection

• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications

• Step 5: Adversial Machine Learning / Text Analytics / Process Mining• Step 6: (Semi-)Automated Audits (Auditors Remain in Control)

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 36 -

Credit Card FraudCredit Card Fraud

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 37 -

Meta Data

Amount

Location

Receiver

TimeStamp

CardId

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 38 -

RandomUnsupervised(Semi) Supervised

- 38 -

Three Method’s to Combine

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 39 -

Card-Number (ID) Probability

RandomUnsupervised(Semi)

Supervised

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 40 -

Challenge I

Being good at detecting known patternsvs

Seeing the new and unknown

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 41 -

Challenge II

Detection Rate

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 42 -

Transforming transactional data (e.g., purchase/date) into a table (RapidMiner Example Set)

Data aggregation and enrichment

=> Creating a profile of the customer

Being good at detecting known patterns

vs

Seeing the new and unknown

Unsupervised

vs

(Semi-) Supervised Learning

Detection rate is critical

Relatively few fraud cases compared to thousands of legit transactions

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 43 -

Now we have a profile, what do we do with it?

Rule based

Daily amount < 500€ p.d.

Local Outlier Factor (LOF)

Distance based algorithm for outlier

detection

Source: https://en.wikipedia.org/wiki/Local_outlier_factor

Supervised

Random Forrest, SVM, …

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 44 -

Being good at detecting known patterns

vs

Seeing the new and unknown

How my customer profile should look like

Class balance:

Relatively few fraud cases

compared to thousands of legit

transactions

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 45 -

Local Outlier Factor (LOF)

Distance based algorithm for outlier detection

Incorporates the concept of local density

(similar to DBSCAN clustering)

Calculated scores are comparable Source: https://en.wikipedia.org/wiki/Local_outlier_factor

Rule Based Systems

A fixed set of rules for classifying events

Classic example: Naïve Bayes for detecting spam mails

HypGraphs and HypTrails

Bayesian Methods for comparing hypothesises of sequential data

Can be applied on transition networks

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 46 -

Healthcare Fraud Detection

RapidMiner Demo

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 47 -

The Challenge

RapidMiner Solution

Outcome

Safeguarding Electronic Payments

• Protecting against fraud and anticipation of risk 7x24

• Large and diverse set of partners (merchants) – over 70,0000

• How to classify and check merchant ecommerce sites for payment system compliance?

• Analyze, classify and check merchants’ ecommerce sites for compliance

• Utilize text mining with NLP to auto-categorize with high sentiment accuracy

• Mashup the widest data sets - historical data on service usage, transaction history, customer profiles, usage logs, and known cases of fraudulent behavior

• Detect anomalies, misuse and fraud through operationalized classification model

• Only 8-10% of merchant sites now screened manually at 80% confidence threshold

• Accurate automated analysis of high risk sites- 92% correctly classified

• Elimination of false positives - no normal sites classified as high risk

• Time and cost to resolve fraud case radically reduced

Anticipating the risk of fraud

Russia’sLargest

Electronic Payment Service

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 48 -

Process Mining & Fraud Detection

• Insurance Claims & Payments Leave Footprints and Audit Trails:– Contracts– Claim reports / incidents– Payments / transactions– Individuals & organisations involved– IT system log files

• Use Process Mining to :– Collect– Normalize– Correlate– Analyze

• RapidMiner RapidProM Extension on the RapidMiner Marketplace

• Financial Audits– Compliance / regulatory audits

– Operational audits

– Transactional services (M&A)

• Purchase Processes & Procurement

• IT Audits– IT Service management

– Cyber security

– Systems compliance

– IT forensic services

• Manufacturing– Identifying assembly bottlenecks

© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 49 -

Process Mining – RapidMiner with RapidProM

ProcessTask 1

ProcessTask 2

ProcessTask 3a

IF/THEN

ProcessTask 3b

ProcessTask 4

ProcessTask 5

Appl. A Appl. B Appl. B Appl. B

Appl. B Appl. C Appl. C

…200612 10:30 User0015 Task1 Case0099260612 23:01 User4801 Task1 Case0223

…200612 10:31 User0015 Task2 Case0099200612 10:35 User0015 Task3b Case0099 …

…200612 10:37 System Task4 Case0099200612 10:38 System Task5 Case0099

Log File App A Log File App B Log File App C

Log File Normalizationand Merge

Process LogData Lake

RapidMiner with

Process Documentation(Bottom up model generation, determination of reference processes)

Social CollaborationSocial Graphs Analysis

Process Harmonization(Compare against to-be processes and show deltas)

Process Optimization(Runtime Analysis, late runners, waiting times, unexpected stops, congestion)

http://www.rapidprom.org

- 50 -CONFIDENTIAL

#1 Agile Predictive Analytics Platform for Today’s Modern Analysts

- 50 -©2015 RapidMiner, Inc. All rights reserved.

Thanks for your Attention!

Ralf Klinkenberg

rklinkenberg@rapidminer.com

www.RapidMiner.com

#1 Agile Predictive Analytics Platform for Today’s Modern Analysts

RapidMiner Wisdom 2018 – New Orleans, LA, USA, October 12th, 2018

Ralf Klinkenberg, Founder & Head of Data Science Research, RapidMiner

rklinkenberg@rapidminer.com

www.RapidMiner.com

Fraud Detection and Prevention: Leveraging Machine Learning to Detect Fraud Patterns, Anomalies, and Unusual Behaviors

top related