Proprietary
2
Proprietary
3
Technology is developing
exponentially and organizations
are still stuck in linear thinking!
Proprietary
4
Proprietary
5
Daily Doses Of Fake
NEWS!!!
Proprietary
6
Proprietary
7
Did anything change? No, we are still trying to make timely decisions which impact the organization as a whole!
The devaluation of information and the impact over time!
Proprietary
8
What is Artificial Intelligence?
A theory of development of computer systems Able to perform
tasks normally requiring human intelligence!
But actually AI is the sum of its parts!
Proprietary
9
AI can transform how banks perform AML, KYC, compliance
▪ AML & KYC▪ Mine huge volumes of data for risk
relevant facts
▪ Simplify the process of identifying higher risk clients
▪ Repetitive tasks▪ Saving valuable time, resource focus on
higher client value tasks
▪ NLP & ML▪ Leapfrog automation across large parts of
clients life cycle management
▪ Intelligent document scanning, enhance KYC process for new client onboarding
Proprietary
10
How do you put this in practice?
Proprietary
11
Collect Your Data: Extract, Transform, Load
▪ Data Types:
o Structured (databases)
o Semi-structured (JSON, CSV etc.)
o Unstructured (server logs, audit logs, email messages, audio call recordings)
▪ Final destinations for consolidation:
o Data Warehouse
o Elasticsearch
o Data Lake: Hadoop HDFS, Amazon S3
▪ Select your source data, filter, transform, enrich and save it into the final destination
▪ Collection and ETL Tools and Platforms:
o SnapLogic, MuleSoft, Tibco
o Apache Spark for batch and stream processing
o Beats, Logstash, Fluentd, Apache Flume as data collectors
o Kafka, Amazon Kinesis for real time data stream processing
o Amazon Kinesis Firehose for collection and storage into the Cloud (Amazon S3, Elasticsearch, Redshift,
Splunk)
Proprietary
12
Collect Your Data: Extract, Transform, Load – cont.
Examples:
▪ SnapLogic periodically polls the source database, picks up the changed records, transforms and stores the
records into Data Warehouse (Redshift, Snowflake, SQL Server Analysis Services etc.)
▪ Kafka Debezium Connecter continuously reads database transaction logs (Change Data Capture i.e. CDC) and
sends events to configured Kafka Topic for further processing, alerting or storage.
Proprietary
13
Security Risks and Compliance
Be aware:
• Where your data is stored across the enterprise
• Pay attention how that data has been classified, does it contain Personally Identifiable Information etc.
• Who has access to data and systems
• When that data has been created or changed
• Are there unusual system access patterns, suspicious IP addresses, anomalies and outliers?
Example 1: Continuously monitor your file servers, detect file changes and evaluate the file content with each
change. If the document has been automatically classified as sensitive and its location is not appropriate, the system
shall raise an alert.
Open Source Solution Architecture
• Elasticsearch File Integrity Module monitors the
files
• File metadata is streamed into Logstash
• Logstash destination evaluates and classifies the
file as Sensitive/Not Sensitive content using
previously trained Machine Learning model
(fasttext), spaCy NLP, Rule Engines and/or Regular
Expressions
Cloud Solution
Amazon Macie (https://aws.amazon.com/macie/) can
automatically detect and classify documents stored into
S3 buckets by using Machine Learning . This security
service detects automatically PII information (optimized
for English language only at the moment)
Proprietary
14
Security Risks and Compliance – cont.
Example 2: Continuously monitor your Web Server traffic logs for unusual access patterns. The system shall be
able to continuously learn the traffic patterns and raise an alert if some access event (user id, resource, remote IP,
time) seems to be quite unusual.
• Elasticsearch Filebeat sends Web server log lines to Logstash
• Logstash filters the financial transaction logs
• Logstash destination evaluates the pairs <username, ip-address> and classifies them as Usual/Unusual. The
evaluation is against Deep Learning ML model created with the AWS Sagemaker IP Insights algorithm
Proprietary
15
Predict Your Customer Behavior
Combine your Customer’s:
• Basic info: age, education, marital status, job … and
• Interaction outcomes for that customer: previous campaigns, number of calls, call channels, products used …
Train Machine Learning algorithm using these features that will give us insights into the customer’s behavior: predict
deposits, customer churn, recommend products etc.
Examples:
• Train ML model (Logistic Regression, XGBoost etc.) that will predict if the customer will place a deposit or not
• Collaborative Filtering for product Recommendations:
o The bank is promoting several products to their Customers. Each Customer has recently engaged with zero,
one or more of those products (Customer_i, Product_j);
o Based on the existing Customer/Product combinations build ML model (Alternating Least Squares,
Factorization Machines) that is able to predict the possibility of Customer_i using Product_j.
Product 1 Product 2 Product 3 Product 4
Customer 1 x x x?
Customer 2 x x? x? x
Customer 3 x? x x
Proprietary
16
Fraud Transaction Detection
Financial transactions can be described with several features related to the Customer’s:
• Basic info: age, education, job, join date, zip code…. and
• Transaction related features: transaction amount, transaction timestamp, currency, customer present etc.
This information is usually stored as hierarchical parent-child one-to-many structures.
Machine Learning process can significantly benefit with additional per transaction features created by aggregating
historical transaction information like:
• Average transaction amount for particular card
• Average time between subsequent transactions for particular card
• Average transaction hour for particular card etc.
Proprietary
17
Fraud Transaction Detection – cont.
The augmented transaction feature set can be generated with Automatic Feature Engineering based on Deep
Feature Synthesis:
• Deep Feature Synthesis: Towards Automating Data Science Endeavors:
https://dai.lids.mit.edu/wp-content/uploads/2017/10/DSAA_DSM_2015.pdf
• Solving the false positives problem in fraud prediction using automated feature engineering:
https://dai.lids.mit.edu/wp-content/uploads/2018/07/bbva_ecml.pdf
Manually generating additional transaction features is a tedious process!
There are hundreds of possible features to be extracted!
Existing Transaction Features
a b c d
Tx_1
Tx_2
Tx_3
Auto Generated Features
Mean(a) Std(a) Mean(b) Std(b) Mode(c) Hour(d) Wday(d) … …
Proprietary
18
Fraud Transaction Detection – cont.
Example:
• Use Feature Tools (https://github.com/FeatureLabs/featuretools) for generating rich feature ML training set
based on the customer’s historical purchase information. Besides the existing features like customer age,
education, transaction amount, transaction timestamp, each credit card transaction is enhanced with
additional features like:
o card.MEAN(transaction.amount)
o card.STD(transaction.amount)
o card.AVGTIMEBETWEEN (transaction.timestamp) etc.
feature_matrix = ft.dfs(
entityset=es,
target_entity= “transaction”,
agg_primitives=[mean,std, average_time_between]
trans_primitives=[day, weekend] )
• Now we have rich transaction feature set that takes into account many factors that matter in the classification
process… Build Classification ML Model (XGBoost, Random Forest) that will predict if some transaction is
fraudulent or not.
• Stream new transaction related events in real-time (Kafka, AWS Kinesis), evaluate them against the
Machine Learning model if they are fraudulent or not and raise an alert if needed.
Proprietary
19
Fraud Rings Detection
• Fraud ring is a group of two or more people that share common legitimate contact information like Phone,
Address, SSN etc.
• These people create bank accounts using synthetic identities
• They behave normally at the beginning
• Normal behavior leads to unsecured credit card lines, personal loans etc.
• After a while, ring members coordinate their activities and maxes out all of their credit lines
• The ring disappears after that
In order to early detect possible Fraud Rings, we have to analyze the relationships between evolved entities: accounts,
contact information, transactions…
Traditional Relational Databases are NOT a good fit for this kind of analysis that shall support efficient and time
effective detection of complex relationships or patterns.
Instead…
Graph Databases like Neo4j can help us find these relationships, visualize graphs and detect the fraud rings along
with the possible financial impact!
Note: Anti-Money Laundering investigation demo available at:
https://www.youtube.com/watch?v=J7BNKV2Lqy0
Proprietary
20
Fraud Rings Detection – cont.
Neo4j case study for Fraud Ring Detection:
https://github.com/neo4j-contrib/gists/blob/master/other/BankFraudDetection.adoc
MATCH (accountHolder:AccountHolder)-[]->(contactInformation)
WITH contactInformation, count(accountHolder) AS RingSize
MATCH (contactInformation)<-[]-(accountHolder),
(accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]-
>(unsecuredAccount)
WITH collect(DISTINCT accountHolder.UniqueId) AS AccountHolders,
[ . . . ]
RETURN AccountHolders AS FraudRing,
labels(contactInformation) AS ContactType,
RingSize,
round(FinancialRisk) as FinancialRisk
ORDER BY FinancialRisk DESC
Fraud Ring Contact Type Financial Risk
["MattSmith","JaneAppleseed","JohnDoe"] ["Address"] 34387.0
["MattSmith","JaneAppleseed"] ["SSN"] 29387.0
["JaneAppleseed","JohnDoe"] ["PhoneNumbe
r"]
18046.0
Cypher Query for Link Entity Analysis
Proprietary
21
Fraud Rings Detection – cont.
Previous analysis defines particular Fraud Ring based on one single shared contact information. But, Matt shares
something with Jane, Jane with John etc. implying transitive relationships among them.
Neo4j can help us find the complete ring as connected component in the graph by using the Weakly Connected
Components (Union/Find) algorithm!CALL algo.unionFind.stream(
'MATCH (n) WHERE n:AccountHolder OR n:SSN OR n:PhoneNumber OR n:Address RETURN id(n) AS
id',
'MATCH (n)-[r:HAS_ADDRESS|HAS_SSN|HAS_PHONENUMBER]->(m) RETURN id(n) AS source, id(m)
AS target',
{graph: "cypher"})
YIELD nodeId, setId
WITH algo.asNode(nodeId) AS n, setId AS ringId
MATCH (n:AccountHolder)
WITH n as accountHolder, ringId
WITH ringId,collect(DISTINCT accountHolder) AS ring,count(accountHolder) as ringSize
WHERE ringSize>1
UNWIND ring AS accountHolder
MATCH(accountHolder)-[r:HAS_CREDITCARD|HAS_UNSECUREDLOAN]->(unsecuredAccount)
[….]
Fraud Ring Financial Risk
["AliceJohnson", "BobTrudy"] 300000.0
["JohnDoe", "JaneAppleseed", "MattSmith
"]
34387.0
Proprietary
22
Real time ATM Fraud Detection
• Streaming systems like Kafka or Amazon Kinesis can help us identify suspicious events based on their temporal
(when) and/or geospatial (where) attributes
• Streaming systems implement Continuous Query SQL like functionality
Kafka ksqlDB Example (https://www.confluent.io/blog/atm-fraud-detection-apache-kafka-ksql/):
• ATM transaction related information is streamed into Kafka Topic
• Kafka ksqlDB analyses the transaction events (KSQL i.e. streaming SQL) in real-time and identifies if there are
transactions from the same account within relatively short time interval having large geographical distance
between corresponding ATM locations
• Alert if there are such suspicious transactions.SELECT
[…]
(T2.ROWTIME - T1.ROWTIME) AS MILLISECONDS_DIFFERENCE,
GEO_DISTANCE(T1.location->lat, T1.location->lon,
T2.location->lat, T2.location->lon, 'KM') AS
DISTANCE_BETWEEN_TXN_KM,
[…]
FROM
[…]
WITHIN (0 MINUTES, 10 MINUTES)
Proprietary
23
What’s Next?
Deep Learning based products that deal with text, images, voice, language translation etc. have been created in
the past several years. These products allows us to combine them and create interesting architectures that will bring
value to the business.
“Exotic” Example: Detect Customer’s sentiment from the Call Center audio records
• Amazon Transcribe (Dutch supported): Convert the Call Center’s audio records into text
• Amazon Translate: Translate Dutch text into English (Dutch not supported in Amazon Comprehend yet)
• Amazon Comprehend: Analyze the sentiment of the text
• If there were many negative sentiments recently, take an action and manage the Customer Churn Risk!
Proprietary
24
Summary
• Define your Business Goals and Values
• Define your Security Compliance Requirements and Risks
• Be aware of your data scattered across the Enterprise
• Create Data Catalogs
• Monitor the access to your data
• Normalize and transform your data for further BI or ML tasks
• Store your data for analysis into alternative database engines like Graph Databases. Relational databases are not in
the center of the universe anymore!
• Stream your data, perform real-time analysis and take immediate actions, don’t wait!
• Find the hidden patterns in your data using Machine Learning and make informed decisions!
• Combine the AI and ML technologies and products available on the market into brand new functionalities that will
help your business automate many task!
Data Management, Mining and Analysis is not a project.
IT IS A PROCESS!
Proprietary
25
Marjan Sterjev
Amir Sabirovic
amir.sabirovic@interworks.,mk
InterWorks B.V.