Till Rohrmann [email protected] @stsffap
Apache Flink® Meets Apache Mesos® and DC/OS
MapReduce is crunching Data
We need to turn faster!
FMACK Stack
EVENTS
Ubiquitous data streams from
connected devices
INGEST
Apache Kafka
STORE
Apache Flink
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data & build data driven apps
Mesos/ DC/OS
Sensors
Devices
Clients
Datacenter
Naive Approach
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
© 2017 Mesosphere, Inc. All Rights Reserved. 9
Apache Mesos
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average12-15% utilization
mySQL
microservice
Cassandra
Flink
Kafka
Mesosautomated schedulers, workload multiplexing
onto the same machines
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
Apache Flink in a Nutshell
Event-driven applications (event sourcing, CQRS)
Stateful, event-driven,event-time-aware processing
Batch Processing (data sets)
Stream Processing / Analytics (data streams, windows, …)
Programming Model
Computation
Computation
Computation
Computation
Source Source
SinkSink
Transformation
state
state
state
state
What is Flink Good For?
Detecting fraud in real time
As fraudsters get better, need to update models without downtime
Live 24/7 service
Credit card transactions
Notifications and alerts
Evolving fraud models built by data scientists
@
▪ Athena X (https://eng.uber.com/athenax/) ▪ Streaming analytics platform ▪ SQL as abstraction layer
Streams from Hadoop, Kafka, etc
SQL, thresholds, actions
Analytics Alerts
Derived streams
@
▪ Blink based on Flink ▪ A core system in Alibaba Search
• Machine learning, search, recommendations • A/B testing of search algorithms • Online feature updates to boost conversion rate
@
@
Complete social network Implemented using event sourcing andCQRS (Command Query Responsibility Segregation) https://data-artisans.com/blog/drivetribe-cqrs-apache-flink
Apache Flink & Apache Mesos
Why Apache Mesos?▪ Mesos offers full functionality to implement fault
tolerant and elastic distributed applications
▪ 30% of survey respondents were running Flink on Mesos (prior to proper Mesos support, September 2016)
Flink’s Mesos Integration
Apache Flink FrameworkMesos Master
Mesos App Master
Flink MesosResourceManager
JobManager
Mesos Task
TaskManager
Mesos Task
TaskManager
Allocate Resources
Launch Mesos tasks
Register
Execute Job
Resource Manager Components
▪ Monitors connection to Mesos
Connection Monitor Launch Coordinator
▪ Resource offer processing and task scheduling
▪ Gathers offers and matches them to tasks using Fenzo
Task MonitorReconciliation Coordinator
▪ Monitors Mesos tasks ▪ Triggers reconciliation ▪ Makes sure tasks are properly killed
▪ Reconciles tasks view between ResourceManager and Mesos Master
Component Interplay
ResourceManager
Connection Monitor
Launch Coordinator
Task MonitorReconciliation Coordinator
Mesos MasterResource offers
Launch tasks
Monitor tasks
Status messages
Trigger reconciliation
Status messages
Mesos Task
Reconcile tasks
Start TaskManagers
Recover tasks
Kill task
DC/OS
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions & Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
Demo Time
Generator
▪ Financial data generated by generator ▪ Written to Kafka topics ▪ Kafka topics consumed by Flink ▪ Flink pipeline operates on Kafka data ▪ Results written back into Kafka
Conclusion
TL;DL
▪ Apache Flink modern stream processor for real-time processing and event-driven applications
▪ Apache Flink runs on Mesos using Fenzo
▪ DC/OS offers easy to use Flink package
30
Thank you!@stsffap @ApacheFlink @dataArtisans
We are hiring!
data-artisans.com/careers