using graphs to analyze customer networks and detect fraud ...€¦ · finding sets of nodes such...

Post on 24-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2

• Use Case

• Why do we need graph databases?

• Our benefits from adopting graphs

• Using graphs for automation

– Basic graph analytics

– Machine Learning and graphs

• Our results

• Summary

Agenda

3

Who are we?

4

Who are Paysafe?

• Paysafe provides simple and secure payment solutions to businesses of all sizes around the world

• Global transactional volume of $85bn in 2018

• Real-time Payments

• Two e-wallet services

Neteller

Skrill

5

• ~ 500 000 payments per day

• Fines for any fraudulent payment

• Allowed ratio of fraud per channel

• Balance between fraud protection and negative customer experience

• Fraudsters conceal their patterns in lots of data

• Inter-service fraud

Use Case – Fast Fraud Analytics

Each payment must be processed in real-time

Example for a type of fraud: Money Laundering

1. Placement puts the "dirty money" into the legitimate financial system

2. Layering conceals the source of the money through a series of transactions

3. Integration - the now-laundered money is withdrawn from the legitimate account to be used for whatever purposes the criminals have in mind for it

8

To PROCESS or NOT to PROCESS?

or or

Real-time Fraud Screening

9

• Fraud prevention industry benchmark by Kount.com from 2018

– 93% of merchants perform manual reviews

– nearly 30% have a manual review rate between 1 - 5% of their orders

– 16% review between 5 – 10% of their orders

– 20% review more than 10% of their orders.

Manual Review Metrics

10

Reduce manual review time

• Provide powerful visualization tools

• Represent customer relationships (via payment, device fingerprint and etc. )

• Facilitate the analysis of a fraudulent behavior and networks

Reduce manual review count

• Enhance risk engine with Machine Learning (ML)

• Allow real-time features extraction for ML models

Challenges

11

A: Paysafe payments are a GRAPH!

Breakthrough

Q: A better way to analyze connected data?

12

Gartner’s Layers of Fraud Management

2019 Gartner Market Guide for Online Fraud Detection

14

• The whole graph is loaded in memory.

• Minimal memory footprint and fast read access relying on Compressed sparse row format

• Fast, parallel, graph analytics – over 60 built-in algorithms

• PGQL language – easy to learn, its syntax is SQL-like.

• Asynchronous REST API

Graph Database: Oracle Parallel Graph AnalytiX (PGX)

Loading data into the graph

17

• Memory statistics

Graph size in memory ~ 12GB

• Visualizing customer graph, up to seconds, but still depending on the relations

NOTE:

• On-heap memory: only String properties – carefully design graphs using them!

• Off-heap memory: everything else

Use Case Performance Results

Total Count Property Count

Edges ~ 90M ~ 400M

Nodes ~ 7M ~ 24M

19

• Visualize customer’s network up to the Nth hop

• Powerful graph analytics

• Combination of RDBMS and graph

• Fast PGQL queries

Our Benefits from the Homogeneous Graph

20

Before:

Our Benefits from the Homogeneous Graph

After:

Detected customer’s network (one send to many)

Cross-company network

Network of networks

25

Improving our payments graph further

• Challenge: Until now, our vertexes are only customers

• We are missing important info for detecting customer networks

– Mobile numbers

– Geolocation

– IP addresses

– Devices

– Passwords(mates)

26

Question?

What if we want to check whether different customers share a single device for a login and/or a payment ?

27

Breakthrough

Heterogeneous graph to the rescue!

30

Question?

How to automate fraud detection even further?

31

• Graph visualization

• Graph analytics

Advantages:

• Fraud specialist can use the graph for investigations.

• It saves a lot of time and make their work easier.

What can be improved:

• We need a more proactive approach.

Current State

32

Brave new world

Combining Machine Learning with Graphs

Image source: https://thepolicytimes.com/

• Detect patterns in the graph without human intervention.

• Improve existing automation with information from the graph.

33

• Basic graph analytics

• Combining machine learning and graphs

How can we achieve our automation goals

• Page rank

• Community detection

• Strongly connected components

• More built-in algorithms available

• Custom-defined algorithms

Graph Analytics

35

Subset of the graph where every

vertex is reachable from every other

vertex following the directions of the

edges

Strongly Connected Components

Finding sets of nodes such that each set of nodes is densely

connected internally. Community structures are quite common in

real networks. Social networks include community groups (the origin

of the term, in fact) based on common location, interests,

occupation, etc.

Community Detection

PageRank (PR) is an algorithm used by Google Search to rank websites in

their search engine results. The PageRank algorithm outputs a probability

distribution used to represent the likelihood that a person randomly clicking

on links will arrive at a particular page.

Page Rank

43

Hot Devices Analytics

44

• For a 3 day period

‒ Community detection: ~ 7 sec

‒ SCC for the same period: ~ 6 sec

‒ Top 10 Customers Page Rank: ~ 0.8 sec

‒ Hot devices: ~ 15 sec

Use Case Performance Results - Graph Analytics

45

• Manual work does not scale well

• Traditional way of programming is not feasible

• We need a way to learn rules from the data and change these rules automatically when

the data changes.

• Machine learning to the rescue!

Why Machine Learning for automation?

46

• Pros

– Continuous improvement as more data is processed and more patterns are discovered

– Adapt to new situations without the need of human intervention

– Scales well

• Challenges

– Hard to explain

– Data quality

– Creating good machine learning model requires a lot of experimenting

Why Machine Learning for automation?

47

• Node classification

• Given a user in a payment network: Is it a fraudster?

• Graph classification

• Given a network of payments: Is it money laundering?

• Edge prediction

• How likely is user A to send money to user B?

• Pattern similarity search

• Once we identified a fraudulent pattern e.g. (money laundering) find similar

patterns in the whole graph

How can we use the graph for automation?

48

• What about our existing machine learning models?

• Can we use the graph to improve our machine learning models?

• Example features:

‒ Is there a known fraudster at 2 hop distance from the user?

‒ Is the user a part of a community with known fraudsters?

‒ Importance/centrality of the user?

‒ Etc.

Graphs and machine learning

49

1. Graph embeddings

2. Leader detection in communities

3. Fastest growing network detection

Our steps towards better automation

50

Graphs and machine learning

Graphs G

ML input: List of numbersHow to convert node U?

Node U

52

• We can embed nodes or whole graphs/sub-graphs

• Algorithms for building node/graph embeddings

‒ DeepWalk

‒ Node2vec

‒ Anonymous Walk

‒ Graph convolutional networks

Graph Embeddings

53

• Merchant classification

• ~ 90% accuracy

• Fraudsters classification

• ~ 70% accuracy

• ~ 0.68 F1

• Embeddings as an additional feature

• ~ 3% accuracy improvement (from 90% to 93%)

Deep Walk embeddings: our experiments

54

• Having good node/graph embeddings allows

‒ Node classification

‒ Finding similar segments of users in a graph

‒ Edge prediction

‒ Graph classification

‒ Improvement of existing ML models

NOTE:

• Standard embedding algorithms work well on static graphs.

• It is challenging to create embeddings for new nodes.

Embeddings summary

55

• Can we detect users who control networks/multiple accounts?

• Step 1: Detect communities:

‒ Strongly, Weakly connected components, Spectral clustering

Leader detection in communities

56

• Step 2: Compute node importance for all nodes in these communities

• Page rank, Eigenvalue, Degree centrality, etc.

Leader detection in communities

The larger the node the higher its PageRank

59

• Single user receives money from 50 others

Leader detection in communities

2D 3D

60

• Graphs change over time

Fastest growing networks

Time

Yesterday Today

61

• Idea: Take the latest state of a community and investigate how it changed over

time.

Latest state of community: Volume of transactions per day:

Fastest growing networks

62

• Communities whose growth pattern deviates require further investigation

• Example of metrics per day to monitor:

‒ Active users

‒ Transactions

‒ Volume

Fastest growing networks

65

• A result of applying Fastest Growing Networks algorithm

• Influencer found by Page Rank calculation

Fastest Growing Networks

• Representing data with complex connections

• Improving manual workflows

• Making real-time decisions on connected data

• Automating complex tasks – e.g. Fraud detection

– Powerful data analytics

– Machine learning models

• Graphs and ML combination is powerful!

Graphs are good for…

top related