scalable on-demand hadoop clusters with docker and mesos

20
Scalable On-Demand Hadoop Clusters with Docker and Mesos Andrew Nelson, Nutanix @vmwnelson http://virtual-hiking.blogspot.com Chris Mutchler, VMware @chrismutchler http://virtualelephant.com V

Upload: nelsonadpresent

Post on 28-Jul-2015

1.524 views

Category:

Technology


3 download

TRANSCRIPT

Scalable On-Demand Hadoop Clusters with Docker and Mesos

Andrew Nelson, Nutanix@vmwnelson http://virtual-hiking.blogspot.comChris Mutchler, VMware@chrismutchler http://virtualelephant.com

V

2

Agenda

New Approach for Hadoop Ops Infrastructure Resource Considerations Docker as the new “Unit of Work” Future Work

3

Last Year’s State of the Art

Self-service and multi-tenant Hadoop Elastic and decoupled infrastructure Extensible blueprinting

4

New Goals

Operationalize multiple frameworks Decoupled service architecture Flexible and developer-friendly form factor

5

Apache Mesos Introduction

Started at Berkeley Graduated to top level Apache project

2013

Commercial entity is Mesosphere https://github.com/apache/mesos/

6

Mesos Architecture

Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg

7

Mesos as a Multi-TenantResource Pool

Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md

8

Tools to Build and Scale

Serengeti, Vmware https://github.com/vmware-serengeti

BOSH, Pivotal https://github.com/cloudfoundry/bosh

Cloudify, Gigaspaces https://github.com/CloudifySource/cloudify

Cloudbreak, SequenceIQ https://github.com/sequenceiq/cloudbreak

9

Advantages for Ops

Mesos as a Resource Pool Multiple concurrent frameworks Decouple frameworks from resource pools

Compute Partitions on Mesos

10

Shared

Hadoop

Storm

Spark

Kafka

Hadoop Cassandra Storm Spark

Marathon

Cassandra

Siloed

HDFS as a Service

11

NamenodeStandby

Namenode

Secondary Namenode

HDFS

MapReduce

Spark

Hive

Storm

12

Networking Services

Service Discovery Handled per framework Port range resource managed by Mesos slave For example, Marathon uses HAProxy for request routing

Per-container network monitoring Egress rate-limiting

13

Scheduling Options

Mesos scheduling Capacity Scheduler Fair Scheduler

Tenant scheduling examples Hadoop on Mesos Myriad (YARN) on Mesos

14

Dev Workflow

Code Repo / Registry Pull / Push / Commit / Run

Automated Builds Version tagging

Marathon CI / CD Dependencies Rolling restarts

15

Registry Services

Pluggable storage Webhooks Image control

Security Logging

Registry

Repository Repository

Image

Image

Image

16

Advantages for Developers

Interchangeable verbs for code<->containers Choice of framework to use as their PaaS Adopt microservices approach to app pipeline

17

Recommendations for Success

Start small, scale fast Use most appropriate framework for the job Think ahead, decouple Plan for rolling restart capacity up front

18

Gap Analysis

Be prepared to “look under the hood” Variable maturity and resiliency of the layers Networking Security

19

Where Are We Going Next

Scale and learn Container-focused OS Software-defined networking services Discover key performance and availability metrics

20

Wrapping up

Mesos allows for choice of framework Devs utilize Docker with familiar workflow Portable, flexible, and scalable architecture