online predictive modeling of fraud schemes from mulitple live streams by claudiu branzan and david...

23
Online Predictive Modeling of Fraud Schemes from Multiple Live Streams David Talby Claudiu Branzan CTO, Atigeo Principal Lead, Atigeo

Upload: spark-summit

Post on 08-Jan-2017

587 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Online Predictive Modeling of Fraud Schemes from Multiple Live Streams

David Talby Claudiu BranzanCTO, Atigeo Principal Lead, Atigeo

Page 2: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

2

Page 3: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

What we’re up against

3

50+Schemes(and counting)

99.9999%‘Good’ messages

6+Monthsper case

Needle in a haystack

Hybrid analytics

No training data

Semi-supervised learning

Adversarial learning

Online feedback

Page 4: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Why hybrid analytics?

4

Ignore more rules

Unusual timing of events

Unusualpersonal network

Teamwork & scale

Think & talk

differently

Page 5: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

(bits of) the toolbox

5

Rule Inference

Time Series

AnalysisLink Analysis

Ensemble Learning

Natural Language

Page 6: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Can we see some code please?

6

Freely available IPython notebooks

Open source libraries & open data

Jump-start via AWS Marketplace

Page 7: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Stream processing

7

Kafka

Email Stream

Account transactions Stream

Email NLP Features

People graph

Transactions time series

Page 8: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Sample email patterns

Page 9: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Sample natural language annotatorsUnderstandvocabulary

– Jargon– Codewords– Multi- lingual

Understandgrammar

– Whoarewetalkingabout?– Past,presentorfuture?– Compoundsentences

Understandcontext– Email:Re:,Fwd:,attachments– SMS&IMhavetheirowngrammar

Page 10: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Sample email patterns

K-Means failing on “haystacks” Bregman Bubble Clustering

Page 11: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

User analysis iteration

Email NLP Features

User graph

Transactionstime series

Graph Features

Time SeriesFeatures

NLP Features

Agent Feedback

Trai

n / T

est C

lass

ifier

Page 12: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

Really• Makes the world a better place • Needle in a very large haystack

– Actually needs a petabyte-scale platform

• Multi-modal: no single trick works– Hybrid analytics

• No labeled data– Semi-supervised learning– Cold start problem

• Sparse & high-dimensional– Graph based features & change over time

• Adversarial– Feedback & online learning

Technically

Summary: why hunting criminals is cool

1212

Page 13: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

THANK YOU!Get the notebooks: github.com/Atigeo/Atigeo/hunting_criminals_demo

Try it yourself: “xPatterns Connect” on AWS Marketplace

Ask us about it: @davidtalby , @melcutz

Page 14: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby

appendixAppendixIn case the live demo gets cold feet on stage

14

Page 15: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 16: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 17: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 18: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 19: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 20: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 21: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 22: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby
Page 23: Online Predictive Modeling of Fraud Schemes from Mulitple Live Streams by Claudiu Branzan and David Talby