fraud detection and prevention: leveraging machine ... · fraud: areas and types of fraud •...
Post on 21-Jun-2020
14 Views
Preview:
TRANSCRIPT
#1 Agile Predictive Analytics Platform for Today’s Modern Analysts
RapidMiner Wisdom 2018 – New Orleans, LA, USA, October 12th, 2018
Ralf Klinkenberg, Founder & Head of Data Science Research, RapidMiner
rklinkenberg@rapidminer.com
www.RapidMiner.com
Fraud Detection and Prevention: Leveraging Machine Learning to Detect Fraud Patterns, Anomalies, and Unusual Behaviors
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 2 -
Creating Value from Big Data
Fraud – Areas & Types & Relevance
Machine Learning for Fraud Detection & Prevention
Credit Card Fraud Detection & Prevention
1.
2.
3.
4. Healthcare Fraud Detection & Prevention
- 3 -©2016 RapidMiner, Inc. All rights reserved.
Fraud
©2016 RapidMiner, Inc. All rights reserved.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 4 -
Fraud: Areas and Types of Fraud
• Credit Card Fraud
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 5 -
Fraud: Areas and Types of Fraud
• Credit Card Fraud
• Tax Fraud
– EU: Value Added Tax (VAT) Fraud in Transactions withinNetworks of Companies
– Income Tax Fraud / Corporate Tax Fraud
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 6 -
Fraud: Areas and Types of Fraud
• Credit Card Fraud
• Tax Fraud
– EU: Value Added Tax (VAT) Fraud in Transactions withinNetworks of Companies
– Income Tax Fraud / Corporate Tax Fraud
• Fraud in Supply Chains, Retail Networks, Purchase Departments, Procurement
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 7 -
Fraud: Areas and Types of Fraud
• Credit Card Fraud
• Tax Fraud– EU: Value Added Tax (VAT) Fraud in Transactions within
Networks of Companies– Income Tax Fraud / Corporate Tax Fraud
• Fraud in Supply Chains, Retail Networks, Purchase Departments, Procurement
• Insurance Fraud:– Car Insurance (Faked Accidents)
– Fire Insurance
– Healthcare Insurance
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 8 -
Fraud: Healthcare Insurance Fraud
• Example: Medicaid/Medicare in the USA: 1 US State alone: 6 billion US$ budget per year => estimated 10-20% fraud & waste=> 1 billion US$ per year lost
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 9 -
Fraud: Healthcare Insurance Fraud
• Example: Medicaid/Medicare in the USA: 1 US State alone: 6 billion US$ budget per year => estimated 10-20% fraud & waste=> 1 billion US$ per year lost
• Fraudulent Patients (e.g. Drug Addicts/Dealers/Resellers)
• Fraudulent Doctors
• Fraudulant Pharmacies / Hospitals / Service Providers / Suppliers
• Individuals as well as Networks of Fraudsters
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 10 -
Fraud: Challenges for Fraud Detection
• Large Number of Potential Types and Areas of Fraud
• Intelligent and Constantly Improving Adversaries
• Changing Fraud Patterns and Types
• Large Amounts of Potentially Relevant Data
• Large Variety of Potentially Relevant Data Sources & Types– Structured and Unstructured Data: Transactions, Time Series Data,
Textual Data, Network Data, Entity Relations, etc.
• Limited Resources for Fraud Detection & Prevention– Which cases to investigate (first / at all)?
– Prioritize & focus to maximize effectiveness & efficiency
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 11 -
Fraud: Known vs. Unknown Types of Fraud
• New instances of known types of fraud should beautomatically identified
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 12 -
Fraud: Known vs. Unknown Types of Fraud
• New instances of known types of fraud should beautomatically identified:
=> use Machine Learning to automatically find patterns(in data from the past with known fraud cases)
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 13 -
Fraud: Known vs. Unknown Types of Fraud
• New instances of known types of fraud should beautomatically identified:
=> use Machine Learning to automatically find patterns
=> deploy generated models to automatically identify new cases
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 14 -
Fraud: Known vs. Unknown Types of Fraud
• New instances of known types of fraud should beautomatically identified:
=> use Machine Learning to automatically find patterns
=> deploy generated models to automatically identify new cases
• But what about new types of fraud?
- 15 -©2016 RapidMiner, Inc. All rights reserved.
Machine Learning forFraud Detection
©2016 RapidMiner, Inc. All rights reserved.
- 16 -©2016 RapidMiner, Inc. All rights reserved.
Predictive Analytics Transforms Insight into ACTION
Descriptive
Diagnostic
Predictive
Prescriptive
OBSERVEWhat happened
EXPLAINWhy did it happen
ANTICIPATEWhat will happen
ACTOperationalize
Value
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 17 -
Metrics & Indicators for Fraud Risk
• Domain experts often know metrics that may be indicative of a high risk of fraud
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 18 -
Metrics & Indicators for Fraud Risk
• Domain experts often know metrics that may be indicative of a high risk of fraud => incorporate into entity features
• Examples:
– Entity = Patient:
▪ Total Payments Received,
▪ Number of Prescriptions,
▪ Number of Doctors Visited,
▪ Number of Pills per Month, etc.
– Entity = Prescriber (e.g. Doctor):
▪ Total Payments Received, Number of Patients per Month, Amount per Patient, etc.
– Entity = Service Provider (e.g. Pharmacy, Hospital, etc.):
▪ Total Payments Received, Price per Unit, Price per Treatment of Type X, etc.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 19 -
Comparison to Peer Groups
• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 20 -
Comparison to Peer Groups
• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 21 -
Comparison to Peer Groups
• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No, but a high total amount prescribed my indicate ahigh risk of fraud.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 22 -
Comparison to Peer Groups
• Does a high value of „Total Amounts Prescribed“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No, but a high total amount prescribed my indicate a high risk of fraud.
• Oncologists often need to prescribe expensive anti-cancerdrugs=> oncologists may have higher „Total Amounts Prescribed“
than other types of doctors (specializations)=> compare a doctor‘s metric to the average value of his/her
peers (and not to the average for all doctors) => ratio.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 23 -
Leverage Fraud Risk Indicators
• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No, but a high total amount received my indicate a high risk offraud.
• => Rank entities by value of key metrics => suspects
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 24 -
Combined Fraud Risk Indicators
• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No, but a high total amount received my indicate a high risk offraud.
• => Rank entities by value of key metrics => suspects
• => Combine metrics (e.g. weighted sum): Fraud Risk Score=> Rank entities by value of combined metric => suspects
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 25 -
Leverage Fraud Risk Indicators
• Does a high value of „Total Payments Received“ automaticallymean the entity (e.g. doctor) is fraudulent?
• No, but a high total amount received my indicate a high risk offraud.
• => Rank entities by value of key metrics => suspects
• => Combine metrics (e.g. weighted sum): Fraud Risk Score=> Rank entities by value of combined metric => suspects
• No machine learning yet, but an often used initial solution torank and prioritize entities for review / audits / investigation
• => more effective & efficient use of resources (auditors)
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 26 -
Classification
Algorithms to predict classes(Fraud / No Fraud)
Grouping
Group similar items together(Segmentation, Clustering, Item Sets,Association Rules, Sequence Analysis,
Network Analysis)
Anomaly Detection
Find outliers in your data(unusual behaviors)
Regression
Algorithms to predict numbers(Fraud Risk Scores or Expected Values)
Automation
Optimization
Deployment
Feature Extraction
&Selection
Unsupervised Learning
Supervised Learning
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 27 -
Machine Learning: Supervised vs. Unsupervised
• Supervised Machine Learning:– Data from the past with known fraud and non-fraud cases (label);
– Machine Learning of Classification models or Association rules to find fraud patterns from the past and to automatically identify newinstances of these fraud types in new data;
– Applicable to known fraud cases, patterns, and types.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 28 -
Machine Learning: Supervised vs. Unsupervised
• Supervised Machine Learning:– Data from the past with known fraud and non-fraud cases (label);– Machine Learning of Classification models or Association rules to find
fraud patterns from the past and to automatically identify new instancesof these fraud types in new data;
– Applicable to known fraud cases, patterns, and types.
• Unsupervised Machine Learning:– Clustering (Segmentation): Grouping entities into clusters of similar
entities (patients, doctors, service providers, etc.);– Anomaly Detection / Outlier Detection: detect unusual behaviors;– Both depend on selected attributes, normalization and/or weighting;– Attribute Weighting can be used to incorporate domain knowledge and/or
priorities;– Allows to find previously unknown types of fraud.
- 29 -©2016 RapidMiner, Inc. All rights reserved.
Fraud Detection and Prediction
©2016 RapidMiner, Inc. All rights reserved.
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 30 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to
Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 31 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to
Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning:
– Automated Classification
– Risk Score Regression
– Association Rule Generation
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 32 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to
Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation
• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning: Anomaly Detection, Outlier Detection
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 33 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation
• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning: Anomaly Detection, Outlier Detection
• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 34 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation
• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning, Anomaly Detection, Outlier Detection
• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications
• Step 5: Adversial Machine Learning / Text Analytics / Process Mining
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 35 -
Fraud Detection with Machine Learning
• Step 1: Finding Known Fraud Patterns by Embedding Domain Expert Knowledge: Fraud Risk Scoring & Ranking of Entities=> From Random Checks to Systematic Automated Checks & Prioritization: => Data Mining to Automate Fraud Detection
• Step 2: Identifying Known Fraud Patterns with Machine Learning and Automatically Detecting Them in the Future:Supervised Learning: Automated Classification, Risk Score Regression, Association Rule Generation
• Step 3: Identifying Previously Unknown Fraud Cases or Patterns: Unsupervised Learning, Anomaly Detection, Outlier Detection
• Step 4: Comparison with Expectations: Predict Volumes & Prices and Compare with Actual Medications
• Step 5: Adversial Machine Learning / Text Analytics / Process Mining• Step 6: (Semi-)Automated Audits (Auditors Remain in Control)
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 36 -
Credit Card FraudCredit Card Fraud
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 37 -
Meta Data
Amount
Location
Receiver
TimeStamp
CardId
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 38 -
RandomUnsupervised(Semi) Supervised
- 38 -
Three Method’s to Combine
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 39 -
Card-Number (ID) Probability
RandomUnsupervised(Semi)
Supervised
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 40 -
Challenge I
Being good at detecting known patternsvs
Seeing the new and unknown
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 41 -
Challenge II
Detection Rate
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 42 -
Transforming transactional data (e.g., purchase/date) into a table (RapidMiner Example Set)
Data aggregation and enrichment
=> Creating a profile of the customer
Being good at detecting known patterns
vs
Seeing the new and unknown
Unsupervised
vs
(Semi-) Supervised Learning
Detection rate is critical
Relatively few fraud cases compared to thousands of legit transactions
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 43 -
Now we have a profile, what do we do with it?
Rule based
Daily amount < 500€ p.d.
…
Local Outlier Factor (LOF)
Distance based algorithm for outlier
detection
Source: https://en.wikipedia.org/wiki/Local_outlier_factor
Supervised
Random Forrest, SVM, …
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 44 -
Being good at detecting known patterns
vs
Seeing the new and unknown
How my customer profile should look like
Class balance:
Relatively few fraud cases
compared to thousands of legit
transactions
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 45 -
Local Outlier Factor (LOF)
Distance based algorithm for outlier detection
Incorporates the concept of local density
(similar to DBSCAN clustering)
Calculated scores are comparable Source: https://en.wikipedia.org/wiki/Local_outlier_factor
Rule Based Systems
A fixed set of rules for classifying events
Classic example: Naïve Bayes for detecting spam mails
HypGraphs and HypTrails
Bayesian Methods for comparing hypothesises of sequential data
Can be applied on transition networks
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 46 -
Healthcare Fraud Detection
RapidMiner Demo
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 47 -
The Challenge
RapidMiner Solution
Outcome
Safeguarding Electronic Payments
• Protecting against fraud and anticipation of risk 7x24
• Large and diverse set of partners (merchants) – over 70,0000
• How to classify and check merchant ecommerce sites for payment system compliance?
• Analyze, classify and check merchants’ ecommerce sites for compliance
• Utilize text mining with NLP to auto-categorize with high sentiment accuracy
• Mashup the widest data sets - historical data on service usage, transaction history, customer profiles, usage logs, and known cases of fraudulent behavior
• Detect anomalies, misuse and fraud through operationalized classification model
• Only 8-10% of merchant sites now screened manually at 80% confidence threshold
• Accurate automated analysis of high risk sites- 92% correctly classified
• Elimination of false positives - no normal sites classified as high risk
• Time and cost to resolve fraud case radically reduced
Anticipating the risk of fraud
Russia’sLargest
Electronic Payment Service
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 48 -
Process Mining & Fraud Detection
• Insurance Claims & Payments Leave Footprints and Audit Trails:– Contracts– Claim reports / incidents– Payments / transactions– Individuals & organisations involved– IT system log files
• Use Process Mining to :– Collect– Normalize– Correlate– Analyze
• RapidMiner RapidProM Extension on the RapidMiner Marketplace
• Financial Audits– Compliance / regulatory audits
– Operational audits
– Transactional services (M&A)
• Purchase Processes & Procurement
• IT Audits– IT Service management
– Cyber security
– Systems compliance
– IT forensic services
• Manufacturing– Identifying assembly bottlenecks
© 2017 RapidMiner, GmbH & RapidMiner, Inc.: all rights reserved. - 49 -
Process Mining – RapidMiner with RapidProM
ProcessTask 1
ProcessTask 2
ProcessTask 3a
IF/THEN
ProcessTask 3b
ProcessTask 4
ProcessTask 5
Appl. A Appl. B Appl. B Appl. B
Appl. B Appl. C Appl. C
…200612 10:30 User0015 Task1 Case0099260612 23:01 User4801 Task1 Case0223
…200612 10:31 User0015 Task2 Case0099200612 10:35 User0015 Task3b Case0099 …
…200612 10:37 System Task4 Case0099200612 10:38 System Task5 Case0099
Log File App A Log File App B Log File App C
Log File Normalizationand Merge
Process LogData Lake
RapidMiner with
Process Documentation(Bottom up model generation, determination of reference processes)
Social CollaborationSocial Graphs Analysis
Process Harmonization(Compare against to-be processes and show deltas)
Process Optimization(Runtime Analysis, late runners, waiting times, unexpected stops, congestion)
http://www.rapidprom.org
- 50 -CONFIDENTIAL
#1 Agile Predictive Analytics Platform for Today’s Modern Analysts
- 50 -©2015 RapidMiner, Inc. All rights reserved.
Thanks for your Attention!
Ralf Klinkenberg
rklinkenberg@rapidminer.com
www.RapidMiner.com
#1 Agile Predictive Analytics Platform for Today’s Modern Analysts
RapidMiner Wisdom 2018 – New Orleans, LA, USA, October 12th, 2018
Ralf Klinkenberg, Founder & Head of Data Science Research, RapidMiner
rklinkenberg@rapidminer.com
www.RapidMiner.com
Fraud Detection and Prevention: Leveraging Machine Learning to Detect Fraud Patterns, Anomalies, and Unusual Behaviors
top related