mesos study report 03v1.2
DESCRIPTION
Study report of Apache MesosTRANSCRIPT
MesosA Platform for Fine-Grained Resource Sharing in the Data Center
Background
• Rapid innovation in cluster Computing frameworks
Problem
• Rapid innovation in cluster computing frameworks• No single framework optimal for all applications
• Want to run multiple frameworks in a single cluster
» …to maximize utilization
» …to share data between frameworks
Where We Want to Go
Solution
• Mesos is a common resource sharing layer over which diverse frameworks can run
Mesos Goals
• High utilization of resources
• Support diverse frameworks (current & future)
• Scalability to 10,000’s of nodes
• Reliability in face of failures
Mesos
• Fine‐Grained Sharing » Improved utilization, responsiveness , data locality
• Resource Offers » Offer available resources to frameworks, let them pick which
resources to use and which tasks to launch
» Keeps Mesos simple, lets it support future frameworks
Mesos Architecture
Mesos architecture diagram, showing two running frameworks
Resource Offers
• Mesos decides how many resources to offer each framework ,based on an organizational policy such as fair sharing , while frameworks decide which resources to accept and which tasks to run on them
• A framework can reject resources that do not satisfy its constraints in order to wait for ones that do
• Delegating control over scheduling to the frameworks, push control of task scheduling and execution to the frameworks
Resource Offers
• Mesos consists of a master process that manages slave daemons running on each cluster node, and frameworks that run tasks on these slaves.
• Each resource offer is a list of free resources on multiple slaves.• Each framework running on Mesos consists of two components:
» a scheduler that registers with the master to be offered resources,
» an executor process that is launched on slave nodes to run the
framework’s tasks. • When a framework accepts offered resources, it passes Mesos a
description of the tasks it wants to launch on them
Resource Offers
Resource offer example
Resource Offers
Resource Offers
Optimization : Filters
• Let frameworks short‐circuit rejection by providing a predicate on resources to be offered
» E.g. “ nodes from list L” or “nodes with>8GB RAM ”
» Could generalize to other hints as well
Analysis
• Resource offers work well when: » Frameworks can scale up and down elastically
» Task durations are homogeneous
» Frameworks have many preferred nodes
• These conditions hold in current data analytics frameworks (MapReduce, Dryad, …)
» Work divided into short tasks to facilitate load balancing and fault
recovery
» Data replicated across multiple nodes
Resource Allocation
• Mesos delegates allocation decisions to a pluggable allocation module, so that organizations can tailor allocation to their needs.
• Have implemented two allocation modules: » one that performs fair sharing based on a generalization of max-
min fairness for multiple resources(DSF)
» one that implements strict priorities• Task revoke
» if a cluster becomes filled by long tasks, e.g., due to a buggy job
or a greedy framework, the allocation module can also revoke
(kill) tasks
Fault Tolerance
• Master failover using ZooKeeper• Mesos master has only soft state: the list of active
slaves, active frameworks, and running tasks
» a new master can completely reconstruct its internal state from
information held by the slaves and the framework schedulers
• When the active master fails, the slaves and schedulers connect to the next elected master and repopulate its state.
• Aside from handling master failures, Mesos reports node failures and executor crashes to frameworks’ schedulers.
Isolation
• Mesos provides performance isolation between framework executors running on the same slave by leveraging existing OS isolation mechanisms
• currently isolate resources using OS container technologies, specifically Linux Containers and Solaris Projects
• These technologies can limit the CPU, memory, network bandwidth, and (in new Linux kernels) I/O usage of a process tree
Data Locality with ResourceOffers• Ran 16 instances of Hadoop on a shared HDFS cluster• Used delay scheduling in Hadoop to get locality (wait a
short time to acquire data‐local nodes)
Scalability
• Mesos only performs inter-framework scheduling(e.g. fair sharing),which is easier than intra‐framework scheduling
• Result:
Scaled to 50,000
Emulated slaves,
200 frameworks,
100K tasks (30s len)
Conclusion
• Mesos shares clusters efficiently among diverse frameworks thanks to two design elements:
» Fine‐grained sharing at the level of tasks
» Resource offers, a scalable mechanism for
application‐controlled scheduling
• Enables co‐existence of current frameworks and development of new specialized ones
• In use at Twitter , UC Berkeley , Conviva and UCSF