credit fraud prevention on hwx stack
TRANSCRIPT
Credit Fraud Prevention on a Connected Data Platform
Kirk Haslbeck, Sr. Solution Engineer HWX
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Building a Model Show of hands, how many have built a “Model”? What are some limitations?
– Conditional based logic: if/else binary decisions
If you need a lot of data to build a good model, what tools can you use?– Data volumes can eliminate the possibility of desktop tools
Sampling?– Well… we better get an even distribution of true and false positives in each sample, but wait that
requires data munging, back to what tools can we use.
Security Concerns?– Extracting data from it’s secure resting place and pushing it into other environments, often times
unsecure files or desktops where Matlab or R can be installed.
Collaboration– Push processing to the data using modern distributed tooling.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
“All models are wrong, some are useful”
George E. P. Box
Most limiting factor is the data, with modern systems we are now able to capture more data and hopefully produce better insights
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Credit Card Fraud
Requirement: Detect fraudulent transactions. Goal: Save the card company money and build trust amongst card users. Cut down on
fraudulent crime Functional Requirement: Detect fraud in under 2 seconds at point of sale. Learn, adapt
and make smarter decisions over time. Design
– Distance: How far can one travel over a period of time before it is fraudulent?– Category: How can we detect a purchase that a customer wouldn’t likely make?– Frequency: How can we detect purchasing patterns that do not resemble the card holder?
Ideas?– White board some conditional logic, egregiousness vs binary– Back test the data– Build a model per card holder?
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Rules, Statistics, Machine Learning
Rule Based Logic– Great for checking conditions that can prove to be 100% accurate. Easy to build and no reason to
over engineer.– Example: Spending Limit. Card holder limit = $2,000
• If (currentPurchaseAmount + balance > 2,000) then deny transaction
Statistics– Mean, median, mode, variance, deviation– Anomaly detection. Outliers. (i.e. womens retail example)
Machine Learning– Supervised– Unsupervised– Trainable– Adapt over time
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Discovery
Gathered all Credit Card Transactions– Problem is they didn’t make sense– No identifiable patterns, no log normal curves– Gas $45, Chipotle $8.50, Steak dinner $88, Amazon shoes $55
Classification
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Outlier Detection: identify abnormal patterns
Example: identify anomaliesFeatures:- Time frequency- Category - Amount- Distance
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
Show me the Code!
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Next Steps
Limitations of current model– In an Airport ready to fly out– Changes to behavior, like just got a new girlfriend– ? What else
Dependent on the quality of the analyst’s feedback
Tech Overview– Slider, Nifi, Kafka, Storm, Zeppelin, Spark, HBase
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future of Data: Modern Data Application
D A T A I N M O T I O N
STO
RA
GE
STO
RA
GE
GROUP 2GROUP 1
GROUP 4GROUP 3
D A T A A T R E S T
INTERNETOF
ANYTHING
Hortonworks’ unique approach to data-in-motion and data-at-rest powers Actionable Intelligence
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DATA AT REST
DATA IN MOTION
ACTIONABLEINTELLIGENCE
MODERN DATA APPLICATIONS
Actionable Intelligence from Connected Data Platforms
Capturing perishable insights from data in motion
Ensuring rich, historical insights on data at rest
Necessary for modern data applications
Hortonworks DataFlow
Hortonworks Data Platform
12 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 12
Improved Experience
/Reduced Cost
Immediate Customer Feedback
Years of Customer
Transaction Data
Fraud Detection
Complete Customer
Profile
Real time ingest of
transactions
Proactively identify potential fraudulent transactions to protect the customer and improve customer experience• Proactively monitor every credit
card transaction using machine learning to catch potential fraud
• Customer Service Analyst reviews flagged transactions in real time via a next generation application running on the connected platform
• HDF controls real time flow of data in and out of the connected platform to the various source and destination points
Innovate
Renovate
Purchase Behavior Insight
Journey to Fraud Detection
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
D A T A I N M O T I O N Elastic Compute
Machine Learning
Online Data
Interactive Query
Visualization
Data Acquisition
Data Routing
Simple/Complex Real-time Processing
Real Time Decisions
Queuing
D A T A I N M O T I O N
D A T A I N M O T I O N
Fraud Detection Demo Architecture
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Fraud Detection Demo Architecture
Distributed Storage: HDFS
Many Workloads: YARN
Real-time Serving (HBase)
Spark(Machine Learning)
UI and HTTP PubSub(Jetty and Tomcat)
Real-Time Data Movement
(Apache Nifi)
Data Science(Zeppelin)
Resource Allocation(Slider)
Interactive Query(Hive on Tez)
Configuration Managem
ent(Am
bari)Authorization(Ranger)
Real Time Processing(Storm)
Inbound Messaging(Kafka)
D A T A I N M O T I O N
D A T A I N M O T I O N
D A T A I N M O T I O N
Governance(Atlas)
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Machine Learning: Enterprise Data Science at Scale
Use flow of data and the computing power of the connected platform to enable autonomous machine learning
• Real time data flows combined with massive parallel computing allows AI to continuously improve
• Enables AI to make decisions in the “Grey Areas”
Build and train AI on full volume data not a sample• Time, effort, accuracy, scale• Visualize data as it is being manipulated
Deploy the AI model without re-implementing• Spark models can be plugged into a modern
connected platform.
16 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 16
Credit Fraud Analyst Inbox
17 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 17
Hortonworks Data Flow
18 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 18
Hortonworks Data Flow
19 © Hortonworks Inc. 2011 – 2016. All Rights ReservedPage 19
Hortonworks Data Flow
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved