fraud detection in real-time @ apache big data con

25
Seshika Fernando Technical Lead Catch them in the Act Fraud Detection in Real-time

Upload: seshika-fernando

Post on 16-Apr-2017

1.773 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Fraud Detection in Real-time @ Apache Big Data Con

Seshika Fernando

Technical Lead

Catch them in the ActFraud Detection in Real-time

Page 2: Fraud Detection in Real-time @ Apache Big Data Con

Fraud: A Trillion Dollar Problem

Survey results๏ $ 3.5 – 4 Trillion in Global Losses per year

(5% of Global GDP)

Payment Fraud Only๏ Merchants are losing around $ 250B globally๏ Cost of Fraud is around 0.68% of Revenue for Retailers

(2014)๏ Steep rise in Fraud in eCommerce (0.85% of Revenue) and

mCommerce (1.36% of Revenue) with a movement of payments to newer channels

Page 3: Fraud Detection in Real-time @ Apache Big Data Con

3

Why WSO2 Analytics Platform?

Domain Knowledge

Batch Analytics

Interactive Analytics

Real-time Analytics

Predictive Analytics

Fraud Detection Toolkit

Page 4: Fraud Detection in Real-time @ Apache Big Data Con

Solution: Many WaysFraud = AnomalyWe provide many methods of Anomaly Detection in order to capture known and unknown types of fraudulent behavior ๏ Generic Rules๏ Fraud Scoring๏ Advanced Techniques

Capturing anomalous behavior using mathematical modelling

Page 5: Fraud Detection in Real-time @ Apache Big Data Con

5

Capturing Domain Expertise

An example from Payment Fraud Domain

Fraudsters…

๏ Use stolen cards

๏ Buy Expensive stuff

๏ In Large Quantities

๏ Very quickly

๏ At odd hours

๏ Ship to many places

๏ Provide weird email addresses

CEP Queries

Page 6: Fraud Detection in Real-time @ Apache Big Data Con

Generic Rules

Convert all pre-existing knowledge about Fraudulent Behavior within a domain to Generic Rules

๏ Blacklists/Whitelists

๏ Moving Averages

๏ Known Patterns

๏ Outliers

Page 7: Fraud Detection in Real-time @ Apache Big Data Con

7

Queries for Expensive Purchases

define table PremiumProducts (itemNo string);

from TransactionStream[(itemNo== PremiumProducts.itemNo) in PremiumProducts ]

select *

insert into FraudStream;

Page 8: Fraud Detection in Real-time @ Apache Big Data Con

8

Queries for Large Quantities

define table QuantityAverages

(itemNo string, avgQty int, stdevQty int);

from TransactionStream

[(itemNo== av.itemNo and qty > (av.avgQty + 3 * av.stdevQty)) in QuantityAverages as av]

select *

insert into FraudStream;

Page 9: Fraud Detection in Real-time @ Apache Big Data Con

9

Queries for Large Quantities (Learning)

define table QuantityAverages(itemNo string, avgQty int, stdevQty int);

from TransactionStream#window.time(8 hours) select itemNo, avg(qty) as avg, stdev(qty) as stdevgroup by itemNoupdate QuantityAverages as avon itemNo == av.itemNo;

from TransactionStream

[(itemNo== av.itemNo and qty > (av.avgQty + 3 * av.stdevQty)) in QuantityAverages as av]

select *

insert into FraudStream;

Page 10: Fraud Detection in Real-time @ Apache Big Data Con

10

Queries for Transaction Velocity

from e1 = TransactionStream ->

e2 = TransactionStream[e1.cardNo == e2.cardNo] <3:>

within 5 min

select e1.cardNo, e1.txnID, e2[0].txnID, e2[1].txnID, e2[2].txnID

insert into FraudStream;

2:20

Page 11: Fraud Detection in Real-time @ Apache Big Data Con

11

The False Positive Trap

๏ So what if I buy Expensive stuff

๏ And why can’t I buy a lot

๏ Very Quickly

๏ At odd hours

๏ Ship to many places

Rich guy

Gift giver

Busy man

Night owl

Many girlfriends?

Blocking genuine customers could be counter productive and costly

Page 12: Fraud Detection in Real-time @ Apache Big Data Con

12

Fraud Scoring

๏ Use combinations of rules

๏ Give weights to each rule

๏ Derive a single number that reflects many fraud indicators

๏ Use a threshold to reject transactions

๏ You just bought a Diamond Ring?

๏ You bought 20 Diamond Rings, in 15 minutes at 3am from a blacklisted IP address?

Page 13: Fraud Detection in Real-time @ Apache Big Data Con

13

Fraud Scoring

Score =

0.001 * itemPrice

+ 0.1 * itemQuantity

+ 2.5 * isFreeEmail

+ 5 * riskyCountry

+ 8 * suspicousIPRange

+ 5 * suspicousUsername

+ 3 * highTransactionVelocity

2:27

Page 14: Fraud Detection in Real-time @ Apache Big Data Con

Learn from Data

Utilize Machine Learning Techniques to identify ‘unknown’ point anomalies

K-means Clustering

Page 15: Fraud Detection in Real-time @ Apache Big Data Con

Use Markov Models to discover fraudulent behavior through rare activity sequences

Markov Models are stochastic models used to model randomly changing systems

15

Markov Models for Fraud Detection

Page 16: Fraud Detection in Real-time @ Apache Big Data Con

16

Markov Modelling: Process

Classify EventsUpdate

Probability Matrix

Compare Incoming

Sequences

Probability Matrix

Events Alerts

Page 17: Fraud Detection in Real-time @ Apache Big Data Con

17

Markov Model: Classification

Example:

Each transaction is classified under the following three qualities and expressed as a 3 letter token, e.g., HNN

๏ Amount spent: Low, Normal and High

๏ Whether the transaction includes high price ticket item: Normal and High

๏ Time elapsed since the last transaction: Large, Normal and Small

Page 18: Fraud Detection in Real-time @ Apache Big Data Con

18

๏ Create a State Transition Probability Matrix

Markov Models: Probability Matrix

LNL LNH LNS LHL HHL HHS HNSLNL

0.976788 0.542152 0.20706 0.095459 0.007166 0.569172 0.335481LNH

0.806876 0.609425 0.188628 0.651126 0.113801 0.630711 0.099825LNS

0.07419 0.83973 0.951471 0.156532 0.12045 0.201713 0.970792LHL

0.452885 0.634071 0.328956 0.786087 0.676753 0.063064 0.225353HHL

0.386206 0.255719 0.451524 0.469597 0.810013 0.444638 0.612242HHS

0.204606 0.832722 0.043194 0.459342 0.960486 0.796382 0.34544HNS

0.757737 0.371359 0.326846 0.970243 0.771326 0.015835 0.574333

Page 19: Fraud Detection in Real-time @ Apache Big Data Con

19

Markov Models: Probability Comparison

๏ Compare the probabilities of incoming transaction sequences with thresholds and flag fraud as appropriate

๏ Can use direct probabilities or more complex metrics๏ Miss Rate Metric

๏ Miss Probability Metric

๏ Entropy Reduction Metric

๏ Update Markov Probability table with incoming transactions

2:35

Page 20: Fraud Detection in Real-time @ Apache Big Data Con

Dig Deeper

Access historical data using๏ expressive querying๏ easy filtering๏ useful visualizations

to isolate incidents and unearth connections

Page 21: Fraud Detection in Real-time @ Apache Big Data Con

21

Usecase: Payment Fraud

Dashboard

Transactions

Transactions

Transactions

Transactions

PaymentSystem

Batch Analytics

Interactive Analytics

Real-time Analytics

Predictive Analytics

Alerts

Page 22: Fraud Detection in Real-time @ Apache Big Data Con

22

Usecase: Anti Money Laundering

Dashboard

Bank Txns

Bank Txns

Bank Txns

Bank Txns

Core BankingSystem

Batch Analytics

Interactive Analytics

Real-time Analytics

Predictive Analytics

Alerts

Page 23: Fraud Detection in Real-time @ Apache Big Data Con

23

Usecase: Identity Fraud

DashboardEvents

Events

Batch Analytics

Interactive Analytics

Real-time Analytics

Predictive Analytics

Alerts

2:40

Page 24: Fraud Detection in Real-time @ Apache Big Data Con

Referenceso WSO2 Whitepaper on Fraud Detection:

http://wso2.com/whitepapers/fraud-detection-and-prevention-a-data-analytics-approach/

o True Cost of Fraud 2014 http://www.lexisnexis.com/risk/downloads/assets/true-cost-fraud-2014.pdf

o Stop Billions in Fraud Losses using Machine Learning https://www.forrester.com/Stop+Billions+In+Fraud+Losses+With+Machine+Learning/fulltext/-/E-res120912

o Big Data In Fraud Management: Variety Leads To Value And Improved Customer Experience https://www.forrester.com/Big+Data+In+Fraud+Management+Variety+Leads+To+Value+And+Improved+Customer+Experience/fulltext/-/E-RES103841

o Predictions 2015: Identity Management, Fraud Management, And Cybersecurity Converge https://www.forrester.com/Predictions+2015+Identity+Management+Fraud+Management+And+Cybersecurity+Converge/fulltext/-/E-RES120014

o Markov Modelling for Fraud Detection https://pkghosh.wordpress.com/2013/10/21/real-time-fraud-detection-with-sequence-mining/