using graphs to analyze customer networks and detect fraud ...€¦ · finding sets of nodes such...

66

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 2: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

2

• Use Case

• Why do we need graph databases?

• Our benefits from adopting graphs

• Using graphs for automation

– Basic graph analytics

– Machine Learning and graphs

• Our results

• Summary

Agenda

Page 3: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

3

Who are we?

Page 4: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

4

Who are Paysafe?

• Paysafe provides simple and secure payment solutions to businesses of all sizes around the world

• Global transactional volume of $85bn in 2018

• Real-time Payments

• Two e-wallet services

Neteller

Skrill

Page 5: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

5

• ~ 500 000 payments per day

• Fines for any fraudulent payment

• Allowed ratio of fraud per channel

• Balance between fraud protection and negative customer experience

• Fraudsters conceal their patterns in lots of data

• Inter-service fraud

Use Case – Fast Fraud Analytics

Each payment must be processed in real-time

Page 6: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 7: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Example for a type of fraud: Money Laundering

1. Placement puts the "dirty money" into the legitimate financial system

2. Layering conceals the source of the money through a series of transactions

3. Integration - the now-laundered money is withdrawn from the legitimate account to be used for whatever purposes the criminals have in mind for it

Page 8: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

8

To PROCESS or NOT to PROCESS?

or or

Real-time Fraud Screening

Page 9: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

9

• Fraud prevention industry benchmark by Kount.com from 2018

– 93% of merchants perform manual reviews

– nearly 30% have a manual review rate between 1 - 5% of their orders

– 16% review between 5 – 10% of their orders

– 20% review more than 10% of their orders.

Manual Review Metrics

Page 10: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

10

Reduce manual review time

• Provide powerful visualization tools

• Represent customer relationships (via payment, device fingerprint and etc. )

• Facilitate the analysis of a fraudulent behavior and networks

Reduce manual review count

• Enhance risk engine with Machine Learning (ML)

• Allow real-time features extraction for ML models

Challenges

Page 11: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

11

A: Paysafe payments are a GRAPH!

Breakthrough

Q: A better way to analyze connected data?

Page 12: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

12

Gartner’s Layers of Fraud Management

2019 Gartner Market Guide for Online Fraud Detection

Page 13: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 14: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

14

• The whole graph is loaded in memory.

• Minimal memory footprint and fast read access relying on Compressed sparse row format

• Fast, parallel, graph analytics – over 60 built-in algorithms

• PGQL language – easy to learn, its syntax is SQL-like.

• Asynchronous REST API

Graph Database: Oracle Parallel Graph AnalytiX (PGX)

Page 15: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 16: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Loading data into the graph

Page 17: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

17

• Memory statistics

Graph size in memory ~ 12GB

• Visualizing customer graph, up to seconds, but still depending on the relations

NOTE:

• On-heap memory: only String properties – carefully design graphs using them!

• Off-heap memory: everything else

Use Case Performance Results

Total Count Property Count

Edges ~ 90M ~ 400M

Nodes ~ 7M ~ 24M

Page 18: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 19: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

19

• Visualize customer’s network up to the Nth hop

• Powerful graph analytics

• Combination of RDBMS and graph

• Fast PGQL queries

Our Benefits from the Homogeneous Graph

Page 20: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

20

Before:

Our Benefits from the Homogeneous Graph

After:

Page 21: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Detected customer’s network (one send to many)

Page 22: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Cross-company network

Page 23: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Network of networks

Page 24: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 25: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

25

Improving our payments graph further

• Challenge: Until now, our vertexes are only customers

• We are missing important info for detecting customer networks

– Mobile numbers

– Geolocation

– IP addresses

– Devices

– Passwords(mates)

Page 26: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

26

Question?

What if we want to check whether different customers share a single device for a login and/or a payment ?

Page 27: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

27

Breakthrough

Heterogeneous graph to the rescue!

Page 28: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 29: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 30: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

30

Question?

How to automate fraud detection even further?

Page 31: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

31

• Graph visualization

• Graph analytics

Advantages:

• Fraud specialist can use the graph for investigations.

• It saves a lot of time and make their work easier.

What can be improved:

• We need a more proactive approach.

Current State

Page 32: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

32

Brave new world

Combining Machine Learning with Graphs

Image source: https://thepolicytimes.com/

• Detect patterns in the graph without human intervention.

• Improve existing automation with information from the graph.

Page 33: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

33

• Basic graph analytics

• Combining machine learning and graphs

How can we achieve our automation goals

Page 34: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

• Page rank

• Community detection

• Strongly connected components

• More built-in algorithms available

• Custom-defined algorithms

Graph Analytics

Page 35: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

35

Subset of the graph where every

vertex is reachable from every other

vertex following the directions of the

edges

Strongly Connected Components

Page 36: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 37: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 38: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 39: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

Finding sets of nodes such that each set of nodes is densely

connected internally. Community structures are quite common in

real networks. Social networks include community groups (the origin

of the term, in fact) based on common location, interests,

occupation, etc.

Community Detection

Page 40: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 41: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

PageRank (PR) is an algorithm used by Google Search to rank websites in

their search engine results. The PageRank algorithm outputs a probability

distribution used to represent the likelihood that a person randomly clicking

on links will arrive at a particular page.

Page Rank

Page 42: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 43: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

43

Hot Devices Analytics

Page 44: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

44

• For a 3 day period

‒ Community detection: ~ 7 sec

‒ SCC for the same period: ~ 6 sec

‒ Top 10 Customers Page Rank: ~ 0.8 sec

‒ Hot devices: ~ 15 sec

Use Case Performance Results - Graph Analytics

Page 45: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

45

• Manual work does not scale well

• Traditional way of programming is not feasible

• We need a way to learn rules from the data and change these rules automatically when

the data changes.

• Machine learning to the rescue!

Why Machine Learning for automation?

Page 46: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

46

• Pros

– Continuous improvement as more data is processed and more patterns are discovered

– Adapt to new situations without the need of human intervention

– Scales well

• Challenges

– Hard to explain

– Data quality

– Creating good machine learning model requires a lot of experimenting

Why Machine Learning for automation?

Page 47: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

47

• Node classification

• Given a user in a payment network: Is it a fraudster?

• Graph classification

• Given a network of payments: Is it money laundering?

• Edge prediction

• How likely is user A to send money to user B?

• Pattern similarity search

• Once we identified a fraudulent pattern e.g. (money laundering) find similar

patterns in the whole graph

How can we use the graph for automation?

Page 48: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

48

• What about our existing machine learning models?

• Can we use the graph to improve our machine learning models?

• Example features:

‒ Is there a known fraudster at 2 hop distance from the user?

‒ Is the user a part of a community with known fraudsters?

‒ Importance/centrality of the user?

‒ Etc.

Graphs and machine learning

Page 49: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

49

1. Graph embeddings

2. Leader detection in communities

3. Fastest growing network detection

Our steps towards better automation

Page 50: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

50

Graphs and machine learning

Graphs G

ML input: List of numbersHow to convert node U?

Node U

Page 51: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 52: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

52

• We can embed nodes or whole graphs/sub-graphs

• Algorithms for building node/graph embeddings

‒ DeepWalk

‒ Node2vec

‒ Anonymous Walk

‒ Graph convolutional networks

Graph Embeddings

Page 53: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

53

• Merchant classification

• ~ 90% accuracy

• Fraudsters classification

• ~ 70% accuracy

• ~ 0.68 F1

• Embeddings as an additional feature

• ~ 3% accuracy improvement (from 90% to 93%)

Deep Walk embeddings: our experiments

Page 54: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

54

• Having good node/graph embeddings allows

‒ Node classification

‒ Finding similar segments of users in a graph

‒ Edge prediction

‒ Graph classification

‒ Improvement of existing ML models

NOTE:

• Standard embedding algorithms work well on static graphs.

• It is challenging to create embeddings for new nodes.

Embeddings summary

Page 55: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

55

• Can we detect users who control networks/multiple accounts?

• Step 1: Detect communities:

‒ Strongly, Weakly connected components, Spectral clustering

Leader detection in communities

Page 56: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

56

• Step 2: Compute node importance for all nodes in these communities

• Page rank, Eigenvalue, Degree centrality, etc.

Leader detection in communities

The larger the node the higher its PageRank

Page 57: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 58: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 59: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

59

• Single user receives money from 50 others

Leader detection in communities

2D 3D

Page 60: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

60

• Graphs change over time

Fastest growing networks

Time

Yesterday Today

Page 61: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

61

• Idea: Take the latest state of a community and investigate how it changed over

time.

Latest state of community: Volume of transactions per day:

Fastest growing networks

Page 62: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

62

• Communities whose growth pattern deviates require further investigation

• Example of metrics per day to monitor:

‒ Active users

‒ Transactions

‒ Volume

Fastest growing networks

Page 63: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 64: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are
Page 65: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

65

• A result of applying Fastest Growing Networks algorithm

• Influencer found by Page Rank calculation

Fastest Growing Networks

Page 66: Using Graphs to Analyze Customer Networks and Detect Fraud ...€¦ · Finding sets of nodes such that each set of nodes is densely connected internally. Community structures are

• Representing data with complex connections

• Improving manual workflows

• Making real-time decisions on connected data

• Automating complex tasks – e.g. Fraud detection

– Powerful data analytics

– Machine learning models

• Graphs and ML combination is powerful!

Graphs are good for…