jimdowlingsics cloudday 13 hops paas
TRANSCRIPT
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
1/36
HOPS: Hadoop Open Platform-as-a-Service
Alberto Lorente, Hamid Afzali, Salman Niazi, MahmoudIsmail, Kamal Hakimazadeh, Hooman Piero, Jim Dowling
Scale Research Laboratory
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
2/36
What is Hadoop?
Open-source software for storing enormous data
sets across distributed commodity servers and thenrunning distributed analysis applications in parallelover the data.
Hadoop 1.x
HDFS(distributed storage)
MapReduce(resource mgmt, job scheduler,
data processing)
Hadoop 2.x
MapReduce(data processing)
HDFS(distributed storage)
YARN(resource mgmt, job scheduler)
Others(spark, mpi, giraph, etc)
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
3/36
Why Hadoop on Virtualized Infrastructures*?
Scheduling and resource utilization
- Elastic scalability on demand
Datacenter efficiency
- Share your computing resources
Ease-of-Deployment
- Lower barrier to adoption
http://wiki.apache.org/hadoop/Virtual%20Hadoop*
http://wiki.apache.org/hadoop/Virtual%20Hadoophttp://wiki.apache.org/hadoop/Virtual%20Hadoophttp://wiki.apache.org/hadoop/Virtual%20Hadoop -
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
4/36
Hadoop on the Cloud
Cloud Computing traditionally separates storage and
computation.
OpenStack
Nova (Compute)
Glance (VM Images)
Swift (Object Storage)
Amazon Web Services
EC2 Elastic Block Storage
S3
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
5/36
Hadoop is designed around Data Locality
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
6/36
Can Hadoop be deployed in virtual infrastructures?
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
7/36
Yes, But
Cloud hardware
configurations shouldsupport data locality
Hadoops original topologyawareness breaks
Placement of >1 VMcontaining block replicas forthe same file on the samephysical host increasescorrelated failures
Need VMWaresNodeGroup aware topology
HADOOP-8468
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
8/36
Data Locality complicates Hadoop Lifecycle
Shouldnt upgrade Hadoop clusters by replacing VMs
- Assuming were not using instances with external storagevolumes, we need to upgrade software on existing instances
Create VM
from pre-built
image
Install/upgrade
software
Configure and
restart services
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
9/36
HOPS and HDFS
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
10/36
HDFS in Hadoop 1.0
DN DN DN DN
NNClientr ead / home/ j d/ myDna. bam
SpoF
mgmt &r epor t s
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
11/36
High Availability in HDFS 2.0
DN DN DN DN
NNActive
NNStandby
JN JN JN
Shared NN
log stored in
quorum of
journal nodes
ZK ZK ZK
FailoverController
Active
FailoverController
Standby
Heartbeats
NN
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
12/36
HDFS in HOPS
DN DN DN DN
NN NN NN
Master
NDB
Up to 48 nodes
*http://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.html
72 million 100-btyetransactional reads/sec*
http://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.htmlhttp://mikaelronstrom.blogspot.se/2012/05/mysql-cluster-72-achieves-43bn-reads.html -
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
13/36
HOPS HDFS Features
Increased read and write throughput
Supports TBs of metadata (vs GBs in vanilla HDFS)
Easy to customize metadata
- Block-level indexing
NDB provides online backups, geographicreplication, online add-node
Major challenge for users:
Installing, configuring, and managing HOPS
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
14/36
*Configured stacks include apps, dependencies, and firewalls.
HOPS has typically four different stacks*
ResourceMgr
NN
ssh, agent,
chef, collectd
NodeMgr
DN
ssh, agent,
chef, collectd
MYSQLD
MGMD
ssh, agent,
chef, collectd
NDBD
ssh, agent,
chef, collectd
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
15/36
Plus a Dashboard Stack
Web Application
Glassfish
ssh, chef, collectd-server
REST APIs
JClouds
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
16/36
How do we deploy our PaaS?
NDBD NDBD MGMD
DashB NN NN
DN DN DN
Data Center
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
17/36
You could bootstrap the Dashboard, old style..
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
18/36
Cloudera Manager Cloud Express Wizard*
Go t o EC2 i n AWS web consol e and sel ect I nst ances
Use t he def aul t N. Vi r gi ni a ( us- east - 1) r egi on.
Cl i ck on Launch I nst ance
On t he next page, pi ck t he Ubunt u Server 12. 04 LTS 64- bi t i mage.
sel ect Cr eat e a new Key Pai r .
cl i ck Cr eate and Downl oad your key pai r .
save t hi s f i l e or you won t be abl e t o SSH i nt o t he i nst ance we r eabout t o l aunch.
$ wget ht t p: / / ar chi ve. cl ouder a. com/ cm4/ i nst al l er / l at est / cl ouder a-manager - i nst al l er . bi n
$ chmod +x cl ouder a- manager - i nst al l er . bi n$ sudo . / cl ouder a- manager - i nst al l er . bi n
*http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/
Abr i dged EC2- speci f i c i nst al l at i on i nst r uct i ons*
h f
http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/http://blog.cloudera.com/blog/2013/03/how-to-create-a-cdh-cluster-on-amazon-ec2-via-cloudera-manager/ -
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
19/36
Oh yes, I forgot its an Open PaaS
By Open PaaS, we mean that we should be able to
deploy our PaaS on different platforms with no codechanges:
- Public Clouds: AWS, Rackspace, etc
- Private Clouds: OpenStack, Docker, etc
- Bare-Metal
No vendor lock-in.
d l f f l
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
20/36
Direct deployment of PaaS on AWS from a Portal
NDBD NDBD MGMD
NN
DN DN DN
Public Cloud (AWS)
WebsiteDashB NN
h i hb d h f bli il bl
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
21/36
Launch via Dashboard when few public IPs available
NDBD NDBD MGMD
NN
DN DN DN
Private Cloud (OpenStack)
WebsiteDashB NN
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
22/36
How do we install the software on the VMs?
Th D hb d i t i ll t VMI
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
23/36
The Dashboard is typically a custom VMI
AWS
- Provides APIs to automate the creation andloading of custom VM images (VMIs)*
OpenStack
- Glance is OpenStacks HTTP-based image server(slow due to serialized access)**
DashB
*http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html
**http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance
http://techblog.netflix.com/2013/03/ami-creation-with-aminator.htmlhttp://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performancehttp://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performancehttp://techblog.netflix.com/2013/03/ami-creation-with-aminator.html -
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
24/36
Other nodes are configured using Chef
Chef to Install software on Nodes
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
25/36
Chef to Install software on Nodes
chef chef chef
Dashboard
RecipesRecipesRecipesChef Recipes are infrastructure
as code: idempotent & composable
ResourceMgr
NN
ssh, agent,chef, collectd
MYSQLD
MGMD
ssh, agent,chef, collectd
NodeMgr
DN
ssh, agent,chef, collectd
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
26/36
But Chef and VMIs are not enough....
HOPS Open PaaS
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
27/36
HOPS Open PaaS
Amazon Webservices OpenStack
Chef
Bare Metal
JClouds ssh
HOPS PaaS API (YAML)
VMIs BitTorrentReduce Install TimesCreate VMs
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
28/36
Define the cluster in single file
PaaS Orchestration Language
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
29/36
PaaS Orchestration Language
- - -
name: HopsTest
envi r onment : devr eci pes: [ ssh, chef Cl i ent , col l ectd]
aut hor i zePor t s: [ 8080, 8181, 3306, 4343, 3321]
provider:
name: aws- ec2
i nst anceType: m1. l arge
l ogi n: ubunt u
i mage: eu- west - 1/ ami - 35667941
r egi on: eu- west - 1
zones: [ eu- west - 1a]
Define the cloud you will deploy on, and default node types
global scope (all nodes)
PaaS Orchestration Language
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
30/36
PaaS Orchestration Language
nodes:
- ndbd
number : 2- mgmd
number : 1r eci pes: [ mysql d]
- namenodenumber : 2r eposi t or y: ht t p: / / myr epo. gi t
- datanoder eci pes: [ nodemgr ]number : 3
- dashboar dr eci pes: [ r esour cemgr ]number : 1
# chef At t r i but es: {. . }# aut hor i zePor t s: [ . . ]
# rol es : [ . . ]
# i mage: . . .
# bi t t or r ent : t r ue
Should be the name of a chef recipe
Explicitly add chef recipes to this node
other parameters per node-group
Use a custom recipe for the NameNode
Language Design Decisions
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
31/36
Language Design Decisions
Declarative language
Chef-friendly default behaviors
Configure nodes by adding recipes or roles
- Users can fork our default recipes and customize them.
- Per node-group parameters supported.
Language implementation can be changed to use a
different configuration mgmt system, if needed.
Jclouds/Chef in the Wild
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
32/36
Jclouds/Chef in the Wild
Node provisioning can fail for a variety of reasons.
Stragglers will appear as clusters grow in size.
Low default number of parallel operations can be
issued to AWS, OpenStack, etc.
Benchmark VMs to identify poorly performing ones.
Provisioning Scheduler
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
33/36
Provisioning Scheduler
Respawn & provision failed and slow VMs.
Scheduler executes Chef recipes as a series of phases
- Chef recipes are decomposed into the following phases:init, bittorrent, install, configure
BitTorrent phase Useful for larger software packages and private clouds.
Standard package repos used in the event of failure in the install phase.
Related Work
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
34/36
Related Work
Virtualization and Hadoop
- Project Serengeti (VMWare)- Project Savanna (Hortonworks & OpenStack)
- Elastic MapReduce (Amazon Web Services)
Administration of Hadoop Clusters- Cloudera Manager with Puppet
- Hortonworks Ambari with Puppet
Open PaaS
- Cloudify provide a general OpenPaaS middlewarebased on a groovy DSL, with additional supportfor chef.
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
35/36
Tack fr mig!
-
7/23/2019 Jimdowlingsics Cloudday 13 Hops Paas
36/36
VMS LAUNCHING
VM LAUNCHING!
[XKCD Comic 303]