learn realtime ai with apache flink · 2019-10-30 · handle a variety of use-cases better...
TRANSCRIPT
Learn Realtime AI with Apache Flink
This Photo by Unknown Author is licensed under CC BY-SA This Photo by Unknown Author is licensed under CC BY-NC
Gautam Gupta
TECHNOLOGY LEADER AI / ML / CLOUD FOCUSED INTUIT
WORK ON DATA @SCALE LINKEDIN/GUPTAGAUTAM/
AI vs Traditional Programming
• Traditional Programming
Y = f(x) write f(x)
E.g. Tax calculation
• AI Modelling
Y = f(x) find f(x)
E.g. Price prediction
This Photo by Unknown Author is licensed under CC BY-SA
What is Realtime AI?
• AI Models that learn in Realtime
• AI Models that consume the real-time behavior of users
• AI Models that produce results for real-time usage
This Photo by Unknown Author is licensed under CC BY-NC-ND
Realtime AI Use case: Commute
• Uber / Lyft: Ride pricing
• Accidents happen in Realtime, how to pass this data to directions modules for Realtime routing
• Weather conditions are unpredictable
• Road maintenance
• Crowdsource App: Waze
This Photo by Unknown Author is licensed under CC BY-SA
Realtime AI Use case: Email
• Spam reduction
• Smart email categorization: Google Priority Inbox
This Photo by Unknown Author is licensed under CC BY-SA-NC
Realtime AI Use case: Banking
• Mobile Check Deposits
• Fraud Prevention
• Credit Decision
This Photo by Unknown Author is licensed under CC BY-SA
Realtime AI Use case: Online Shopping
• Amazon: Go / Search
• Alibaba: Singles Day 11/11
• Recommendations
• Fraud Protection
This Photo by Unknown Author is licensed under CC BY-NC
What is Real-time?
"predictably fast enough for use by processes being serviced'' [Marsh and Greenwood]
"ability of the system to guarantee a response after a (domain defined) fixed time has elapsed" [Laffey et al., 1988]
"[a system] designed to operate with a well-defined measure of reactivity” [Georgeff, 1988]
This Photo by Unknown Author is licensed under CC BY-NC-ND
Four aspects of Realtime
Speed Responsiveness
Timeliness Graceful Adaptation
Benefits of Realtime AI
Quicker Response
Increased consumer satisfaction
Handle a variety of use-cases
Better performance
Apache Flink
What is Apache Flink?
Stream data processing framework
Distributed processing engine
Stateful computations
Works with unbounded and bounded data streams
What makes Apache Flink special?
• Stateful
• In-Memory Performance
• Exactly-once state consistency
This Photo by Unknown Author is licensed under CC BY
Flink Benefits
• Flink has been designed to run:
• In all common cluster environments
• Perform computations at in-memory speed
• At any scale
This Photo by Unknown Author is licensed under CC BY-NC-ND
Flink @ Scale
• applications processing multiple trillions of events per day,
• applications maintaining multiple terabytes of state, and
• applications running on thousands of cores.
This Photo by Unknown Author is licensed under CC BY-NC-ND
What are the alternatives to Flink?
Apache Spark
1
Apache Beam
2
Apache Storm
3
Custom stream processing framework
4
vs.
• Spark is a Batch Processing framework.
• Micro batching based compute
• Implemented in Scala
• No efficient memory manager
• Flink is a Stream processing framework.
• Window and checkpoint based compute
• Implemented in Java
• Automatic memory manager
Apache Flink: Layered APIs
• High Level Analytics API
• Stream & Batch data processing API
• Process Functions
Evolution of Streaming
REAL TIME ANALYTICS
REAL TIME DASHBOARD
REAL TIME ETL RECOMMENDER SYSTEMS
FRAUD DETECTION
INTRUSION PREVENTION
ANOMALY DETECTION
Apache Flink: Data Streams
• Streams: Bounded or Unbounded
• State: Stateful
• Time: Event time vs processing time
This Photo by Unknown Author is licensed under CC BY
Apache Flink: 24X7 Operations
Consistent Checkpoints
Efficient Checkpoints
End to end Exactly once
Integration with Cluster Managers
High Availability Setup
This Photo by Unknown Author is licensed under CC BY-SA-NC
Realtime AI Architecture
Realtime AI Architecture
Amazon Athena
Amazon Managed Streaming for Kafka
Amazon S3
Amazon EC2
AWS Lambda
Data Scientists
End Customers Amazon DynamoDB
Applications Kafka
AI Model
Transactional vs Event Driven Application
Flink Pre-requisites
Java 8 Scala API(Optional)
Highly Available Apache Zookeeper
Distributed storage
Flink : Key Differentiators
High Throughput
Exactly Once guarantee
Out of order stream
processing
Backpressure Stateful stream
processing
Why move to Realtime streaming?
DATA STREAMS: SENSOR DATA, APPS DATA
BUSINESS: FASTER RESPONSE TO INSIGHT
CUSTOMER: QUICKER FEEDBACK
Reactive vs. Proactive Approach
• Reactive: Post Transaction
• Proactive: Pre-decision approach
This Photo by Unknown Author is licensed under CC BY
Reactive vs. Proactive Approach
Mass branding to 1X1 targeting Advertising
Static branding to 1X1 personalization Retail
Break then fix to Repair then fix Manufacturing
Mass remedy to designer medicine Health care
Traditional investing to automated algorithms Financial Services
Realtime AI Use cases
Fraud detection Anomaly Detection
Rule-based Alerting
Business process monitoring
AI Feedback Loop
USE LEARNINGS TO IMPROVE THE AI MODEL
REAL TIME FEEDBACK ANALYTICS FOR OFFLINE MODEL IMPROVEMENT
Realtime AI Challenges
Unexpected Events
Bandwidth planning
Monitoring
Model evaluation
Anomaly handling
Results Validation
Predictability
Questions