big data application architectures - fraud detection

31
Big Data Application Architectures – Fraud Detection Nishant Thacker Technical Product Manager – Big Data Microsoft

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

1.554 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Big Data Application Architectures - Fraud Detection

Big Data Application Architectures – Fraud DetectionNishant ThackerTechnical Product Manager – Big DataMicrosoft

Page 2: Big Data Application Architectures - Fraud Detection

AgendaDefine the problemEstablish the expected outcomeDive into each pillarDetermine a SolutionUnderstand the applicability

Page 3: Big Data Application Architectures - Fraud Detection

Financial Institutions risk

EV

Loss of Charterand a host of other penalties through noncompliance with

federal money laundering legislation.

Detect fraudulent activity, theft, and money laundering

Prescribe what to sell, when, where, and to whom

Reduce risk while complying with legal requirements

Prevent customers from leaving

Page 4: Big Data Application Architectures - Fraud Detection

Big Data Evolution

Legacy Systems Current SystemsBig Data

Advanced Analytics

Timely Info Accurate Thoughtful

Opportunity $

Page 5: Big Data Application Architectures - Fraud Detection

Marketing Operations Bankers CEOs

• Next Best Action• Recommended

Interventions• Lifestyle Yield

Management• Seasonal Personal

Impact

• Theft Profiling• Fraudulent Transaction

Identification• Remote Shutdown• Site Monitoring

• Recommended Interventions

• Risky Customer Profiling• Call Center Monitoring• Churn Scoring

• Payment System Errors• Money Laundering

prevention• Compliance• Data Entry Intervention

?

Personalization of offers & banking experience

Risk Reduction & Compliance

Customer Churn PreventionFraud Detection

Areas of Opportunity for Financial Analytics

Page 6: Big Data Application Architectures - Fraud Detection

Expected Outcome

• Rejected Transactions• Real Time Alerts• Real Time Dashboards• Automated Learning and Improvement – Batch and Real Time• Audit Trails and Analytics

$

Page 7: Big Data Application Architectures - Fraud Detection

Big Data Challenges

VolumeIt’s big.

VeracityIt’s unverified

VarietyIt’s different

VelocityIt’s fast.

Page 8: Big Data Application Architectures - Fraud Detection

Architectural Considerations• Storage State

• Cached• Distributed Cache• Distributed Storage

• Profile Storage• HDFS• HBase

• Ingestion Framework• Kafka• Sqoop• Event Hubs• IoT Hubs

• Stream Processing• Storm• Spark• Flink• Azure Stream

Analytics

• Analytics• Batch• Interactive

• Machine Learning• Standalone• Scale out

Page 9: Big Data Application Architectures - Fraud Detection

Fraud Detection Reference Architecture

Apps data from devices

News and other alerts

Solution UX

Provisioning API (Pull)

User Profile Information

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

User Recent Activity Store

Gateway

Data Lake

Gateway

App Backend

Data PathOptional solution componentMain solution component

Thin Client

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Personal mobile devices

Trades and/or transactions

Business systems

Page 10: Big Data Application Architectures - Fraud Detection

Reference Architecture with Azure Services

Solution UX

Provisioning API

User Profile Information

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

User Recent Activity Store Store

Data Lake

Gateway

App Backend

Personal mobile devices

Business systems

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Apps data from devices

News and other alerts

Gateway

Data PathOptional solution componentMain solution component

Thin Client

Trades and/or transactions

Page 11: Big Data Application Architectures - Fraud Detection

DemoWoodgrove Financial

Page 12: Big Data Application Architectures - Fraud Detection

User Profile and Metadata Stores

App Backend Solution UX

Provisioning API

User Profile Information

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

User Recent Activity Information

Data Lake

Gateway(Kafka, IoT Hub,

Event Hubs)

Data PathOptional solution componentMain solution component

Metadata Store

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 13: Big Data Application Architectures - Fraud Detection

Device Identity, Registry and State StoresMetadata Store

Authority for all registered sources Stores identity information and authentication secrets

User Profile InformationIndexed list of all Users and their demographics – Secure, Governed, Audit

ControlledContains discovery and reference data related to UsersCan define a schema model or use a vertical industry standard schema for

metadataCan contain structured metadata and links to externally stored operational data

User Recent ActivityContains operational data related to the Users’ most recent activities: - “Last known values” for each User - Aggregated or computed values - Stream of device data events containing Geo location and Time based tagging

Page 14: Big Data Application Architectures - Fraud Detection

Stream Processors

App Backend Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Data PathOptional solution componentMain solution component

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 15: Big Data Application Architectures - Fraud Detection

Stream Processing: Data FlowAfter ingress through the Gateway (Ingestion), the flow of data through the system is facilitated by data pumps and analytics tasks

Data flow can be driven by:• Apache Storm on Azure

HDInsight• Apache Spark on Azure

HDInsight• Azure Stream Analytics• Custom Event

Processors

Each can perform tasks in flight:• Data aggregation• Data enrichment • Complex event

processing

… and can output data to:• Azure Data Lake• Azure Blobs/Tables• HDInsight / HBase• Azure SQL DB • Time Series Databases• Event Hub • Service Bus Queues

Page 16: Big Data Application Architectures - Fraud Detection

Stream Processor Examples

Queue

Device Registry StoreDevice Metadata Processor

Data Lake

Device State StoreDevice State Processor

Notification Processor

Raw Telemetry Processor

App Backend

Rules Processor

Event HubStream Transformation Processor

Secondary Stream Processor

Data PathOptional solution componentMain solution component

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 17: Big Data Application Architectures - Fraud Detection

App Backend

App Backend Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Storage

Cloud Gateway

Data PathOptional solution componentMain solution component

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 18: Big Data Application Architectures - Fraud Detection

High-Scale Compute ModelsScale-appropriate compute models

Actor Frameworks / Service Fabric Reliable Actors: distributed compute fabric hosting device actors. Service Fabric Reliable Collections: highly available with replicated and local state management.Azure Batch: job scheduling and compute management for highly parallelizable compute workloads.

Simple programming logic in vastly scalable compute nodes

Page 19: Big Data Application Architectures - Fraud Detection

Data Analytics

App Backend Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Cloud Gateway

Data PathOptional solution componentMain solution component

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 20: Big Data Application Architectures - Fraud Detection

Data Analytics

Event HubNRT Events

Stream Processing

(ASA, Storm or Spark)

Alerts

Batch Events

Fetching & Updating

Reference Data

Interceptor (Rules)

Spark

Hive/Pig

U-SQL

Azure Data Lake Store Azure Data Lake Analytics

SQL DB

ML

Reports and Dashboards

Real Time Scoring

Training ML Models

Relational Data

Page 21: Big Data Application Architectures - Fraud Detection

Data AnalyticsReal-Time Analysis Aggregation/Reduction, Temporal Queries, State Correlation, Threshold Detection, Alerting

Data-At-Rest AnalysisTime-Series, Map/Reduce, Correlation

Machine LearningPattern Detection, Behavior PredictionPlausibility Analysis, Anomaly and Fraud Detection

Power BI

HDInsight

Stream Analytics

Data Factory

Machine Learning

Page 22: Big Data Application Architectures - Fraud Detection

Presentation and Business Connectivity

App Backend Solution UX

Provisioning API

Identity and Registry Stores

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

Device State Store

Data Lake

Cloud Gateway

Data PathOptional solution componentMain solution component

Gateway

Trades and/or

transactions

Thin Client

News and other alerts

Apps data from devices

Page 23: Big Data Application Architectures - Fraud Detection

WebHDFS

YARN

U-SQL

Analytics Service HDInsight(managed Hadoop Clusters)

1

1

1

1

1

1 1

1

1

1

1

1

Analytics

Store

Azure Data Lake

Page 24: Big Data Application Architectures - Fraud Detection

Cortana Intelligence Suite

Action

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Dashboards & Visualizations

Cortana

Bot Framework

Cognitive Services

Power BI

Information Management

Event Hubs

Data Catalog

Data Factory

Machine Learning and Analytics

HDInsight (Hadoop and Spark)

Stream Analytics

Intelligence

Data Lake Analytics

Machine Learning

Big Data Stores

SQL Data Warehouse

Data Lake Store

Data Sources

Apps

Sensors and devices

Data

Page 25: Big Data Application Architectures - Fraud Detection

Reference Architecture with Azure Services

Solution UX

Provisioning API

User Profile Information

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

User Recent Activity Store Store

Data Lake

Gateway

App Backend

Personal mobile devices

Business systems

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Apps data from devices

News and other alerts

Gateway

Data PathOptional solution componentMain solution component

Thin Client

Trades and/or transactions

Page 26: Big Data Application Architectures - Fraud Detection

Money Laundering PreventionFraud Detection

$$$$$ ¥£ € £

Placement Layering Integration

Process

Know your Customer

Transaction Monitoring

Pattern Detection

Machine Learning

Decision Tree

Classification

Cluster Analysis

Page 27: Big Data Application Architectures - Fraud Detection

Cloud

VisualizationMachine Learning & AnalyticsFinancial DataInformation Management

Anti-Money Laundering

Power BI

Fund monitoring dashboard

Power BI / Azure WebsiteAzure Services

Big Data Storage for Multiple

Sources

HDInsight

Azure Data Lake

Azure Data

Warehouse

SQL Azure Azure Machine Learning

SQL

Financial Data

Real-time fraud detection feedback

Information Services

HDInsight

Streaming Analytics

Page 28: Big Data Application Architectures - Fraud Detection

Data Science Modeling • Similar to linear regression• Weights independent

variables• Useful with categorical

independent variable• Offers coefficients to inform

management decision-making

• Very useful with internal analytical teams to interpret data

• Useful for diagnosing gaps in data and customer outreach

• Helps drive understanding of demand drivers

• Uses decision trees & votes• Forest

• Compares results between various outcomes

• Votes upon outcomes • Evaluates based upon a

series of logical questions or “forest”

• Jungle• Useful when a forest

produces too many logical branches

• Produces a series of weighted edges and nodes

• Trained in input data• Useful for complex tasks,

like speech recognition when allowed to train in depth

• Very good with complex interactions

• Enables retailers to better identify behaviour patterns & certain shopping activities

Multi-Class Logistic Regression Multi-Class Neural Network

Multi-Class Decision Forest or Jungle

Page 29: Big Data Application Architectures - Fraud Detection

Reference Architecture & Azure Services

Solution UX

Provisioning API

User Profile Information

Stream Processors

Analytics &Machine Learning

Business Integration Connectors

and Gateway(s)

User Recent Activity Store Store

Data Lake

Gateway

App Backend

Personal mobile devices

Business systems

Presentation & Business ConnectivityData Processing, Analytics and ManagementDevice Connectivity

Apps data from devices

News and other alerts

Gateway

Data PathOptional solution componentMain solution component

Thin Client

Trades and/or transactions

Page 30: Big Data Application Architectures - Fraud Detection

Q&A

[email protected]

@nishantthacker

Click icon to add picture

Page 31: Big Data Application Architectures - Fraud Detection

© 2016 Microsoft Corporation. All rights reserved.