learn realtime ai with apache flink · 2019-10-30 · handle a variety of use-cases better...

34
Learn Realtime AI with Apache Flink This Photo by Unknown Author is licensed under CC BY-NC

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 2: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Gautam Gupta

TECHNOLOGY LEADER AI / ML / CLOUD FOCUSED INTUIT

WORK ON DATA @SCALE LINKEDIN/GUPTAGAUTAM/

Page 3: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

AI vs Traditional Programming

• Traditional Programming

Y = f(x) write f(x)

E.g. Tax calculation

• AI Modelling

Y = f(x) find f(x)

E.g. Price prediction

This Photo by Unknown Author is licensed under CC BY-SA

Page 4: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

What is Realtime AI?

• AI Models that learn in Realtime

• AI Models that consume the real-time behavior of users

• AI Models that produce results for real-time usage

This Photo by Unknown Author is licensed under CC BY-NC-ND

Page 5: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Use case: Commute

• Uber / Lyft: Ride pricing

• Accidents happen in Realtime, how to pass this data to directions modules for Realtime routing

• Weather conditions are unpredictable

• Road maintenance

• Crowdsource App: Waze

This Photo by Unknown Author is licensed under CC BY-SA

Page 7: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Use case: Banking

• Mobile Check Deposits

• Fraud Prevention

• Credit Decision

This Photo by Unknown Author is licensed under CC BY-SA

Page 8: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Use case: Online Shopping

• Amazon: Go / Search

• Alibaba: Singles Day 11/11

• Recommendations

• Fraud Protection

This Photo by Unknown Author is licensed under CC BY-NC

Page 9: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

What is Real-time?

"predictably fast enough for use by processes being serviced'' [Marsh and Greenwood]

"ability of the system to guarantee a response after a (domain defined) fixed time has elapsed" [Laffey et al., 1988]

"[a system] designed to operate with a well-defined measure of reactivity” [Georgeff, 1988]

This Photo by Unknown Author is licensed under CC BY-NC-ND

Page 10: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Four aspects of Realtime

Speed Responsiveness

Timeliness Graceful Adaptation

Page 11: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Benefits of Realtime AI

Quicker Response

Increased consumer satisfaction

Handle a variety of use-cases

Better performance

Page 12: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Apache Flink

Page 13: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

What is Apache Flink?

Stream data processing framework

Distributed processing engine

Stateful computations

Works with unbounded and bounded data streams

Page 14: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

What makes Apache Flink special?

• Stateful

• In-Memory Performance

• Exactly-once state consistency

This Photo by Unknown Author is licensed under CC BY

Page 16: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Flink @ Scale

• applications processing multiple trillions of events per day,

• applications maintaining multiple terabytes of state, and

• applications running on thousands of cores.

This Photo by Unknown Author is licensed under CC BY-NC-ND

Page 17: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

What are the alternatives to Flink?

Apache Spark

1

Apache Beam

2

Apache Storm

3

Custom stream processing framework

4

Page 18: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

vs.

• Spark is a Batch Processing framework.

• Micro batching based compute

• Implemented in Scala

• No efficient memory manager

• Flink is a Stream processing framework.

• Window and checkpoint based compute

• Implemented in Java

• Automatic memory manager

Page 19: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Apache Flink: Layered APIs

• High Level Analytics API

• Stream & Batch data processing API

• Process Functions

Page 20: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Evolution of Streaming

REAL TIME ANALYTICS

REAL TIME DASHBOARD

REAL TIME ETL RECOMMENDER SYSTEMS

FRAUD DETECTION

INTRUSION PREVENTION

ANOMALY DETECTION

Page 21: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Apache Flink: Data Streams

• Streams: Bounded or Unbounded

• State: Stateful

• Time: Event time vs processing time

This Photo by Unknown Author is licensed under CC BY

Page 22: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Apache Flink: 24X7 Operations

Consistent Checkpoints

Efficient Checkpoints

End to end Exactly once

Integration with Cluster Managers

High Availability Setup

This Photo by Unknown Author is licensed under CC BY-SA-NC

Page 23: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Architecture

Page 24: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Architecture

Amazon Athena

Amazon Managed Streaming for Kafka

Amazon S3

Amazon EC2

AWS Lambda

Data Scientists

End Customers Amazon DynamoDB

Applications Kafka

AI Model

Page 25: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Transactional vs Event Driven Application

Page 26: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Flink Pre-requisites

Java 8 Scala API(Optional)

Highly Available Apache Zookeeper

Distributed storage

Page 27: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Flink : Key Differentiators

High Throughput

Exactly Once guarantee

Out of order stream

processing

Backpressure Stateful stream

processing

Page 28: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Why move to Realtime streaming?

DATA STREAMS: SENSOR DATA, APPS DATA

BUSINESS: FASTER RESPONSE TO INSIGHT

CUSTOMER: QUICKER FEEDBACK

Page 29: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Reactive vs. Proactive Approach

• Reactive: Post Transaction

• Proactive: Pre-decision approach

This Photo by Unknown Author is licensed under CC BY

Page 30: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Reactive vs. Proactive Approach

Mass branding to 1X1 targeting Advertising

Static branding to 1X1 personalization Retail

Break then fix to Repair then fix Manufacturing

Mass remedy to designer medicine Health care

Traditional investing to automated algorithms Financial Services

Page 31: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Use cases

Fraud detection Anomaly Detection

Rule-based Alerting

Business process monitoring

Page 32: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

AI Feedback Loop

USE LEARNINGS TO IMPROVE THE AI MODEL

REAL TIME FEEDBACK ANALYTICS FOR OFFLINE MODEL IMPROVEMENT

Page 33: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Realtime AI Challenges

Unexpected Events

Bandwidth planning

Monitoring

Model evaluation

Anomaly handling

Results Validation

Predictability

Page 34: Learn Realtime AI with Apache Flink · 2019-10-30 · Handle a variety of use-cases Better performance . Apache Flink . What is Apache ... Evolution of Streaming REAL TIME ANALYTICS

Questions