delivering mixed batch & real-time data infrastructure 2015-11-17,...
TRANSCRIPT
Apache MesosDelivering mixed batch & real-time data infrastructure
2015-11-17, Galway, Ireland
Naoise Dunne, Insight Centre for Data AnalyticsMichael Hausenblas, Mesosphere Inc.
http://www.meetup.com/Galway-Data-Meetup/events/226672887/
Types of Workloads
2
batch streaming PaaS
MapReduce
https://github.com/mesosphere/time-series-demo
Mesos Intro
• A top-level ASF project• A cluster resource negotiator• Scalable to 10,000s of nodes
but also useful for a handful of nodes
• Fault-tolerant, battle-tested• An SDK for distributed apps• Native Docker support
5
Apache Mesos
What is a Data Center Scheduler?
● Schedulers run your Distributed Apps● An operating system kernel for the cloud● Schedulers coordinate execution of work on cluster
A Quick History of Schedulers
Quick history of distributed schedulers
2004 mapreduce paper
2004 Google Borg
2011 Hadoop1.02003 Google filesystem
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
2008 Hadoop released
2013 Yarn
2010 Spark Paper
2010 Nexus (Mesos)
2005 Hadoop started
2013 Mesos Released
2011 Mesos Paper
2014 Kubernetes
2014 Google Omega paper
History of Datacenter Schedulers
2003 Slurm
Hadoop - Original O/S SchedulerMonolithic scheduler: Original open source datacenter scheduler
● jobs are batched and executed● Designed only to run Mapreduce jobs● No concurrency between apps● Evolving into yarn
Hadoop
Linux Server Linux Server
hadoop- resource management
mesos slavemesos slave
Linked datat m/r job
Linked datat m/r job
Linked datat app
Mesos - a Great Leap Forward2 level scheduler : More flexible
● Can Schedule many kinds of applications● Frameworks (such as spark) are delegated
the per application scheduling● Mesos responsible for resource distribution
between applications and enforcing overall fairness
● Very modular, due to 2 level scheduling. frameworks manage apps as they like
Mesos
Linux Server Linux Server
Mesos - resource management
Mesos - scheduler jobs
frameworkchronos
mesos slave
frameworkspark
frameworkmarathon
mesos slave
Hadoop M/R job
Linked data job
Linked datat app
How Mesos Works
Mesos Architecture
12
http://mesos.berkeley.edu/mesos_tech_report.pdf
Mesos Resources
● resource == anything a task/executor consumes in order
to do their work
● standard resources: cpu, mem, disk, ports
● DRF
© 2015 Mesosphere, Inc. 14
© 2015 Mesosphere, Inc. 15
© 2015 Mesosphere, Inc. 16
© 2015 Mesosphere, Inc. 17
© 2015 Mesosphere, Inc. 18
© 2015 Mesosphere, Inc. 19
© 2015 Mesosphere, Inc. 20
© 2015 Mesosphere, Inc. 21
© 2015 Mesosphere, Inc. 22
© 2015 Mesosphere, Inc. 23
© 2015 Mesosphere, Inc. 24
© 2015 Mesosphere, Inc. 25
© 2015 Mesosphere, Inc. 26
© 2015 Mesosphere, Inc. 27
© 2015 Mesosphere, Inc. 28
© 2015 Mesosphere, Inc. 29
Benefits of using a Scheduler
● Efficiency - best use of computing resources● Agility - change your application mix with no turnaround● Scalability - grow to the current demand of your app● Modularity - 2 level schedulers have plugin frameworks
that allow quick repurposing of core and no reliance on one vendor (more later)
Mesos Ecosystem
Mesos Ecosystem
Mesos
Mesos - scheduler short jobs Mesos - scheduler long run jobs
Spark Fwk Chronos Fwk
Marathon Framework
OS Monitor
Mesos Monitor
Linux Server Linux Server Linux Server
Mesos - resource management
mesos client Docker mesos
client Docker mesos client Docker
Resources cpu mem disk Managed by Mesos
Applications work with frameworks to get resources they need
Frameworks Negotiate with mesos to run their jobs
DatastoresHDT, Neo4JgraphX Granatum RevealedGraph
Jobs
Docker manages isolation on Linux servers
Mesos Ecosystem
Mesos
Linux Server Linux Server Linux Server
Mesos - resource management
Mesos - scheduler short jobs Mesos - scheduler long run jobs
Spark Fwk Chronos Fwk Marathon fwk
mesos client Docker mesos
client Docker mesos client Docker
HDFS
Zookeeper
We need HDFS for large storage on Spark Jobs
Marathon can now use HDFS to store large Dependencies
Docker Registry
Universe/MultiverseDCOS DCOS DCOS
Mesos & frameworks needs zookeeper
you will need docker reg for marathon
To run mesos you will need dcos or glue
Mesos DNS
Need Mesos DNS for service discovery
Datacenter schedulers: Why?
Schedulers help you focus on your own work and not the infrastructure.“its great to be able to focus on what it is you want to be doing rather than worrying about how do you get what it is you need in order to be able to get stuff done”
- John Wilkes (Google)
Mesos Best Practices
Mesos Best Practices● Discovery
● Orchestration
● Composition
Discovery
Orchestration
Orchestration
Orchestration
Composition
● Marathon: apps and groups
● Kubernetes: pods and services
● Reusability, affinity and loose coupling
Monitoring
Monitoring
Enter DCOS
Local OS vs. Distributed OS
45http://bitly.com/os-vs-dcos
DCOS, A Distributed Operating System
46
• kernel (Apache Mesos, written in C++) scales to 10,000 of nodes
• fault-tolerant in all components, rolling upgrades throughout
• containers first class citizens (LXC, Docker)
• local OS per node (+container enabled)
• scheduling (long-lived, batch)
• service discovery, monitoring, logging, debugging
DCOS High Level Overview
47
Any Service or Container
Any Infrastructure
Mesosphere DCOS
Your favorite services, container formats, and those yet to come
Build apps once on DCOS, and run it anywhere
Runs distributed apps anywhere as simply as running apps on your laptop
DCOS Benefits
48
• Run stateless services such as Web servers, app servers (via Marathon) and stateful services like Spark, Kafka, HDFS, Cassandra, ArangoDB etc. together on one cluster
• Dynamic partitioning of your cluster, depending on your needs (business requirements)
• Increased utilization (10% → 80% and more)
DCOS Architecture
49https://docs.mesosphere.com/getting-started/dcosarchitecture/
It’s demo time …
https://github.com/mesosphere/time-series-demo
See Also …
52
http://shop.oreilly.com/product/9781939902184.do http://shop.oreilly.com/product/0636920039952.do https://manning.com/books/mesos-in-action
See Also …
53
http://p24e.io
Q & A