jimdowlingsics cloudday 13 hops paas

Upload: parashara

Post on 17-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    1/36

    HOPS: Hadoop Open Platform-as-a-Service

    Alberto Lorente, Hamid Afzali, Salman Niazi, MahmoudIsmail, Kamal Hakimazadeh, Hooman Piero, Jim Dowling

    [email protected]

    Scale Research Laboratory

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    2/36

    What is Hadoop?

    Open-source software for storing enormous data

    sets across distributed commodity servers and thenrunning distributed analysis applications in parallelover the data.

    Hadoop 1.x

    HDFS(distributed storage)

    MapReduce(resource mgmt, job scheduler,

    data processing)

    Hadoop 2.x

    MapReduce(data processing)

    HDFS(distributed storage)

    YARN(resource mgmt, job scheduler)

    Others(spark, mpi, giraph, etc)

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    3/36

    Why Hadoop on Virtualized Infrastructures*?

    Scheduling and resource utilization

    - Elastic scalability on demand

    Datacenter efficiency

    - Share your computing resources

    Ease-of-Deployment

    - Lower barrier to adoption

    http://wiki.apache.org/hadoop/Virtual%20Hadoop*

    http://wiki.apache.org/hadoop/Virtual%20Hadoophttp://wiki.apache.org/hadoop/Virtual%20Hadoophttp://wiki.apache.org/hadoop/Virtual%20Hadoop
  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    4/36

    Hadoop on the Cloud

    Cloud Computing traditionally separates storage and

    computation.

    OpenStack

    Nova (Compute)

    Glance (VM Images)

    Swift (Object Storage)

    Amazon Web Services

    EC2 Elastic Block Storage

    S3

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    5/36

    Hadoop is designed around Data Locality

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    6/36

    Can Hadoop be deployed in virtual infrastructures?

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    7/36

    Yes, But

    Cloud hardware

    configurations shouldsupport data locality

    Hadoops original topologyawareness breaks

    Placement of >1 VMcontaining block replicas forthe same file on the samephysical host increasescorrelated failures

    Need VMWaresNodeGroup aware topology

    HADOOP-8468

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    8/36

    Data Locality complicates Hadoop Lifecycle

    Shouldnt upgrade Hadoop clusters by replacing VMs

    - Assuming were not using instances with external storagevolumes, we need to upgrade software on existing instances

    Create VM

    from pre-built

    image

    Install/upgrade

    software

    Configure and

    restart services

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    9/36

    HOPS and HDFS

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    10/36

    HDFS in Hadoop 1.0

    DN DN DN DN

    NNClientr ead / home/ j d/ myDna. bam

    SpoF

    mgmt &r epor t s

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    11/36

    High Availability in HDFS 2.0

    DN DN DN DN

    NNActive

    NNStandby

    JN JN JN

    Shared NN

    log stored in

    quorum of

    journal nodes

    ZK ZK ZK

    FailoverController

    Active

    FailoverController

    Standby

    Heartbeats

    NN

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    12/36

    HDFS in HOPS

    DN DN DN DN

    NN NN NN

    Master

    NDB

    Up to 48 nodes

    *http://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.html

    72 million 100-btyetransactional reads/sec*

    http://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.htmlhttp://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.html
  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    13/36

    HOPS HDFS Features

    Increased read and write throughput

    Supports TBs of metadata (vs GBs in vanilla HDFS)

    Easy to customize metadata

    - Block-level indexing

    NDB provides online backups, geographicreplication, online add-node

    Major challenge for users:

    Installing, configuring, and managing HOPS

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    14/36

    *Configured stacks include apps, dependencies, and firewalls.

    HOPS has typically four different stacks*

    ResourceMgr

    NN

    ssh, agent,

    chef, collectd

    NodeMgr

    DN

    ssh, agent,

    chef, collectd

    MYSQLD

    MGMD

    ssh, agent,

    chef, collectd

    NDBD

    ssh, agent,

    chef, collectd

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    15/36

    Plus a Dashboard Stack

    Web Application

    Glassfish

    ssh, chef, collectd-server

    REST APIs

    JClouds

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    16/36

    How do we deploy our PaaS?

    NDBD NDBD MGMD

    DashB NN NN

    DN DN DN

    Data Center

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    17/36

    You could bootstrap the Dashboard, old style..

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    18/36

    Cloudera Manager Cloud Express Wizard*

    Go t o EC2 i n AWS web consol e and sel ect I nst ances

    Use t he def aul t N. Vi r gi ni a ( us- east - 1) r egi on.

    Cl i ck on Launch I nst ance

    On t he next page, pi ck t he Ubunt u Server 12. 04 LTS 64- bi t i mage.

    sel ect Cr eat e a new Key Pai r .

    cl i ck Cr eate and Downl oad your key pai r .

    save t hi s f i l e or you won t be abl e t o SSH i nt o t he i nst ance we r eabout t o l aunch.

    $ wget ht t p: / / ar chi ve. cl ouder a. com/ cm4/ i nst al l er / l at est / cl ouder a-manager - i nst al l er . bi n

    $ chmod +x cl ouder a- manager - i nst al l er . bi n$ sudo . / cl ouder a- manager - i nst al l er . bi n

    *http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/

    Abr i dged EC2- speci f i c i nst al l at i on i nst r uct i ons*

    h f

    http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/
  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    19/36

    Oh yes, I forgot its an Open PaaS

    By Open PaaS, we mean that we should be able to

    deploy our PaaS on different platforms with no codechanges:

    - Public Clouds: AWS, Rackspace, etc

    - Private Clouds: OpenStack, Docker, etc

    - Bare-Metal

    No vendor lock-in.

    d l f f l

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    20/36

    Direct deployment of PaaS on AWS from a Portal

    NDBD NDBD MGMD

    NN

    DN DN DN

    Public Cloud (AWS)

    WebsiteDashB NN

    h i hb d h f bli il bl

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    21/36

    Launch via Dashboard when few public IPs available

    NDBD NDBD MGMD

    NN

    DN DN DN

    Private Cloud (OpenStack)

    WebsiteDashB NN

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    22/36

    How do we install the software on the VMs?

    Th D hb d i t i ll t VMI

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    23/36

    The Dashboard is typically a custom VMI

    AWS

    - Provides APIs to automate the creation andloading of custom VM images (VMIs)*

    OpenStack

    - Glance is OpenStacks HTTP-based image server(slow due to serialized access)**

    DashB

    *http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html

    **http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance

    http://techblog.netflix.com/2013/03/ami-creation-with-aminator.htmlhttp://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performancehttp://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performancehttp://techblog.netflix.com/2013/03/ami-creation-with-aminator.html
  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    24/36

    Other nodes are configured using Chef

    Chef to Install software on Nodes

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    25/36

    Chef to Install software on Nodes

    chef chef chef

    Dashboard

    RecipesRecipesRecipesChef Recipes are infrastructure

    as code: idempotent & composable

    ResourceMgr

    NN

    ssh, agent,chef, collectd

    MYSQLD

    MGMD

    ssh, agent,chef, collectd

    NodeMgr

    DN

    ssh, agent,chef, collectd

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    26/36

    But Chef and VMIs are not enough....

    HOPS Open PaaS

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    27/36

    HOPS Open PaaS

    Amazon Webservices OpenStack

    Chef

    Bare Metal

    JClouds ssh

    HOPS PaaS API (YAML)

    VMIs BitTorrentReduce Install TimesCreate VMs

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    28/36

    Define the cluster in single file

    PaaS Orchestration Language

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    29/36

    PaaS Orchestration Language

    - - -

    name: HopsTest

    envi r onment : devr eci pes: [ ssh, chef Cl i ent , col l ectd]

    aut hor i zePor t s: [ 8080, 8181, 3306, 4343, 3321]

    provider:

    name: aws- ec2

    i nst anceType: m1. l arge

    l ogi n: ubunt u

    i mage: eu- west - 1/ ami - 35667941

    r egi on: eu- west - 1

    zones: [ eu- west - 1a]

    Define the cloud you will deploy on, and default node types

    global scope (all nodes)

    PaaS Orchestration Language

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    30/36

    PaaS Orchestration Language

    nodes:

    - ndbd

    number : 2- mgmd

    number : 1r eci pes: [ mysql d]

    - namenodenumber : 2r eposi t or y: ht t p: / / myr epo. gi t

    - datanoder eci pes: [ nodemgr ]number : 3

    - dashboar dr eci pes: [ r esour cemgr ]number : 1

    # chef At t r i but es: {. . }# aut hor i zePor t s: [ . . ]

    # rol es : [ . . ]

    # i mage: . . .

    # bi t t or r ent : t r ue

    Should be the name of a chef recipe

    Explicitly add chef recipes to this node

    other parameters per node-group

    Use a custom recipe for the NameNode

    Language Design Decisions

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    31/36

    Language Design Decisions

    Declarative language

    Chef-friendly default behaviors

    Configure nodes by adding recipes or roles

    - Users can fork our default recipes and customize them.

    - Per node-group parameters supported.

    Language implementation can be changed to use a

    different configuration mgmt system, if needed.

    Jclouds/Chef in the Wild

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    32/36

    Jclouds/Chef in the Wild

    Node provisioning can fail for a variety of reasons.

    Stragglers will appear as clusters grow in size.

    Low default number of parallel operations can be

    issued to AWS, OpenStack, etc.

    Benchmark VMs to identify poorly performing ones.

    Provisioning Scheduler

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    33/36

    Provisioning Scheduler

    Respawn & provision failed and slow VMs.

    Scheduler executes Chef recipes as a series of phases

    - Chef recipes are decomposed into the following phases:init, bittorrent, install, configure

    BitTorrent phase Useful for larger software packages and private clouds.

    Standard package repos used in the event of failure in the install phase.

    Related Work

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    34/36

    Related Work

    Virtualization and Hadoop

    - Project Serengeti (VMWare)- Project Savanna (Hortonworks & OpenStack)

    - Elastic MapReduce (Amazon Web Services)

    Administration of Hadoop Clusters- Cloudera Manager with Puppet

    - Hortonworks Ambari with Puppet

    Open PaaS

    - Cloudify provide a general OpenPaaS middlewarebased on a groovy DSL, with additional supportfor chef.

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    35/36

    Tack fr mig!

  • 7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas

    36/36

    VMS LAUNCHING

    VM LAUNCHING!

    [XKCD Comic 303]