using graphs to analyze customer networks and detect fraud ...€¦ · finding sets of nodes such...
TRANSCRIPT
2
• Use Case
• Why do we need graph databases?
• Our benefits from adopting graphs
• Using graphs for automation
– Basic graph analytics
– Machine Learning and graphs
• Our results
• Summary
Agenda
3
Who are we?
4
Who are Paysafe?
• Paysafe provides simple and secure payment solutions to businesses of all sizes around the world
• Global transactional volume of $85bn in 2018
• Real-time Payments
• Two e-wallet services
Neteller
Skrill
5
• ~ 500 000 payments per day
• Fines for any fraudulent payment
• Allowed ratio of fraud per channel
• Balance between fraud protection and negative customer experience
• Fraudsters conceal their patterns in lots of data
• Inter-service fraud
Use Case – Fast Fraud Analytics
Each payment must be processed in real-time
Example for a type of fraud: Money Laundering
1. Placement puts the "dirty money" into the legitimate financial system
2. Layering conceals the source of the money through a series of transactions
3. Integration - the now-laundered money is withdrawn from the legitimate account to be used for whatever purposes the criminals have in mind for it
8
To PROCESS or NOT to PROCESS?
or or
Real-time Fraud Screening
9
• Fraud prevention industry benchmark by Kount.com from 2018
– 93% of merchants perform manual reviews
– nearly 30% have a manual review rate between 1 - 5% of their orders
– 16% review between 5 – 10% of their orders
– 20% review more than 10% of their orders.
Manual Review Metrics
10
Reduce manual review time
• Provide powerful visualization tools
• Represent customer relationships (via payment, device fingerprint and etc. )
• Facilitate the analysis of a fraudulent behavior and networks
Reduce manual review count
• Enhance risk engine with Machine Learning (ML)
• Allow real-time features extraction for ML models
Challenges
11
A: Paysafe payments are a GRAPH!
Breakthrough
Q: A better way to analyze connected data?
12
Gartner’s Layers of Fraud Management
2019 Gartner Market Guide for Online Fraud Detection
14
• The whole graph is loaded in memory.
• Minimal memory footprint and fast read access relying on Compressed sparse row format
• Fast, parallel, graph analytics – over 60 built-in algorithms
• PGQL language – easy to learn, its syntax is SQL-like.
• Asynchronous REST API
Graph Database: Oracle Parallel Graph AnalytiX (PGX)
Loading data into the graph
17
• Memory statistics
Graph size in memory ~ 12GB
• Visualizing customer graph, up to seconds, but still depending on the relations
NOTE:
• On-heap memory: only String properties – carefully design graphs using them!
• Off-heap memory: everything else
Use Case Performance Results
Total Count Property Count
Edges ~ 90M ~ 400M
Nodes ~ 7M ~ 24M
19
• Visualize customer’s network up to the Nth hop
• Powerful graph analytics
• Combination of RDBMS and graph
• Fast PGQL queries
Our Benefits from the Homogeneous Graph
20
Before:
Our Benefits from the Homogeneous Graph
After:
Detected customer’s network (one send to many)
Cross-company network
Network of networks
25
Improving our payments graph further
• Challenge: Until now, our vertexes are only customers
• We are missing important info for detecting customer networks
– Mobile numbers
– Geolocation
– IP addresses
– Devices
– Passwords(mates)
26
Question?
What if we want to check whether different customers share a single device for a login and/or a payment ?
27
Breakthrough
Heterogeneous graph to the rescue!
30
Question?
How to automate fraud detection even further?
31
• Graph visualization
• Graph analytics
Advantages:
• Fraud specialist can use the graph for investigations.
• It saves a lot of time and make their work easier.
What can be improved:
• We need a more proactive approach.
Current State
32
Brave new world
Combining Machine Learning with Graphs
Image source: https://thepolicytimes.com/
• Detect patterns in the graph without human intervention.
• Improve existing automation with information from the graph.
33
• Basic graph analytics
• Combining machine learning and graphs
How can we achieve our automation goals
• Page rank
• Community detection
• Strongly connected components
• More built-in algorithms available
• Custom-defined algorithms
Graph Analytics
35
Subset of the graph where every
vertex is reachable from every other
vertex following the directions of the
edges
Strongly Connected Components
Finding sets of nodes such that each set of nodes is densely
connected internally. Community structures are quite common in
real networks. Social networks include community groups (the origin
of the term, in fact) based on common location, interests,
occupation, etc.
Community Detection
PageRank (PR) is an algorithm used by Google Search to rank websites in
their search engine results. The PageRank algorithm outputs a probability
distribution used to represent the likelihood that a person randomly clicking
on links will arrive at a particular page.
Page Rank
43
Hot Devices Analytics
44
• For a 3 day period
‒ Community detection: ~ 7 sec
‒ SCC for the same period: ~ 6 sec
‒ Top 10 Customers Page Rank: ~ 0.8 sec
‒ Hot devices: ~ 15 sec
Use Case Performance Results - Graph Analytics
45
• Manual work does not scale well
• Traditional way of programming is not feasible
• We need a way to learn rules from the data and change these rules automatically when
the data changes.
• Machine learning to the rescue!
Why Machine Learning for automation?
46
• Pros
– Continuous improvement as more data is processed and more patterns are discovered
– Adapt to new situations without the need of human intervention
– Scales well
• Challenges
– Hard to explain
– Data quality
– Creating good machine learning model requires a lot of experimenting
Why Machine Learning for automation?
47
• Node classification
• Given a user in a payment network: Is it a fraudster?
• Graph classification
• Given a network of payments: Is it money laundering?
• Edge prediction
• How likely is user A to send money to user B?
• Pattern similarity search
• Once we identified a fraudulent pattern e.g. (money laundering) find similar
patterns in the whole graph
How can we use the graph for automation?
48
• What about our existing machine learning models?
• Can we use the graph to improve our machine learning models?
• Example features:
‒ Is there a known fraudster at 2 hop distance from the user?
‒ Is the user a part of a community with known fraudsters?
‒ Importance/centrality of the user?
‒ Etc.
Graphs and machine learning
49
1. Graph embeddings
2. Leader detection in communities
3. Fastest growing network detection
Our steps towards better automation
50
Graphs and machine learning
Graphs G
ML input: List of numbersHow to convert node U?
Node U
52
• We can embed nodes or whole graphs/sub-graphs
• Algorithms for building node/graph embeddings
‒ DeepWalk
‒ Node2vec
‒ Anonymous Walk
‒ Graph convolutional networks
Graph Embeddings
53
• Merchant classification
• ~ 90% accuracy
• Fraudsters classification
• ~ 70% accuracy
• ~ 0.68 F1
• Embeddings as an additional feature
• ~ 3% accuracy improvement (from 90% to 93%)
Deep Walk embeddings: our experiments
54
• Having good node/graph embeddings allows
‒ Node classification
‒ Finding similar segments of users in a graph
‒ Edge prediction
‒ Graph classification
‒ Improvement of existing ML models
NOTE:
• Standard embedding algorithms work well on static graphs.
• It is challenging to create embeddings for new nodes.
Embeddings summary
55
• Can we detect users who control networks/multiple accounts?
• Step 1: Detect communities:
‒ Strongly, Weakly connected components, Spectral clustering
Leader detection in communities
56
• Step 2: Compute node importance for all nodes in these communities
• Page rank, Eigenvalue, Degree centrality, etc.
Leader detection in communities
The larger the node the higher its PageRank
59
• Single user receives money from 50 others
Leader detection in communities
2D 3D
60
• Graphs change over time
Fastest growing networks
Time
Yesterday Today
61
• Idea: Take the latest state of a community and investigate how it changed over
time.
Latest state of community: Volume of transactions per day:
Fastest growing networks
62
• Communities whose growth pattern deviates require further investigation
• Example of metrics per day to monitor:
‒ Active users
‒ Transactions
‒ Volume
Fastest growing networks
65
• A result of applying Fastest Growing Networks algorithm
• Influencer found by Page Rank calculation
Fastest Growing Networks
• Representing data with complex connections
• Improving manual workflows
• Making real-time decisions on connected data
• Automating complex tasks – e.g. Fraud detection
– Powerful data analytics
– Machine learning models
• Graphs and ML combination is powerful!
Graphs are good for…