savanna - elastic hadoop on openstack

43
Savanna - Hadoop on OpenStack Mirantis, 2013 Sergey Lukjanov Savanna Technical Lead

Upload: sergey-lukjanov

Post on 26-Jan-2015

114 views

Category:

Technology


5 download

DESCRIPTION

Slide deck for the talk at local meetup.

TRANSCRIPT

Page 1: Savanna - Elastic Hadoop on OpenStack

Savanna - Hadoop onOpenStack

Mirantis, 2013Sergey LukjanovSavanna Technical Lead

Page 2: Savanna - Elastic Hadoop on OpenStack

● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. Virtualization

Agenda

Page 3: Savanna - Elastic Hadoop on OpenStack

● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. Virtualization

Agenda

Page 4: Savanna - Elastic Hadoop on OpenStack

● Open source native OpenStack component● Supports different Hadoop distributions● Solves both bare cluster provisioning use case

and "analytics as a service"● Managed through REST API● Web UI as part of the OpenStack Dashboard● Flexible templates of Hadoop configurations

Savanna - Elastic Hadoop on OpenStack

Page 5: Savanna - Elastic Hadoop on OpenStack

● Project home - https://launchpad.net/savanna○ bug tracking○ blueprints○ answers

● Code review (gerrit) - https://review.openstack.org● Sources - https://github.com/stackforge/savanna● Mailing list - [email protected] ● CI - https://jenkins.openstack.org and

http://jenkins.savanna.mirantis.com

Savanna - Elastic Hadoop on OpenStack

Page 6: Savanna - Elastic Hadoop on OpenStack

● Contributors:○ large core team from Mirantis○ teams from RedHat, Hortonworks○ several minor contributors

● Intel joined recently● Several upcoming customers

Savanna - Participants

Page 7: Savanna - Elastic Hadoop on OpenStack

● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. Virtualization

Agenda

Page 8: Savanna - Elastic Hadoop on OpenStack

● Administrators - centralized cluster management and monitoring

● Dev and QA teams - fast clusters provisioning ● Data Scientists/Analysts - API to run the analytic

jobs with infrastructure provisioning happening under the hood

● Making resources dedicated to IaaS cloud available for Hadoop workload

Savanna Use Cases

Page 9: Savanna - Elastic Hadoop on OpenStack

● Central point of control over infrastructure● Enables self-service capabilities, including choice

of Hadoop distribution to be used● Integration with vendor tooling:

○ Ambari for Apache/HortonWorks○ Cloudera Management Console○ Intel Hadoop

● Utilization of free IaaS capacity for Hadoop tasks

Administrators Use Case

Page 10: Savanna - Elastic Hadoop on OpenStack

● Fast on-demand provisioning of the environments

● Increase agility and speed of innovation ● Controlled access to data from production

Dev and QA Use Cases

Page 11: Savanna - Elastic Hadoop on OpenStack

● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood○ Access to higher level interfaces (e.g. pig, hive)

● Bursty workload: ad-hoc queries requiring a significant resource only for short time period

● Utilization of free IaaS capacity for Hadoop tasks

Analytics Use Cases

Page 12: Savanna - Elastic Hadoop on OpenStack

● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. Virtualization

Agenda

Page 13: Savanna - Elastic Hadoop on OpenStack

Roadmap for Hadoop in Cloud

Phase 1 Basic cluster provisioning of Apache Hadoop

Phase 2Cluster operation support and integration with tooling,

advanced configuration (HDFS, Swift, etc.)

Phase 3"Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Page 14: Savanna - Elastic Hadoop on OpenStack

Phase 1 - Basic Cluster Operation

● Cluster provisioning● Deployment Engine implementation for pre-

installed images● Templates for Hadoop cluster configuration● REST API for cluster startup and operations● Web UI integrated into OpenStack Dashboard

Page 15: Savanna - Elastic Hadoop on OpenStack

Roadmap for Hadoop in Cloud

Phase 1 [Released - April, 10]Basic cluster provisioning of Apache Hadoop

Phase 2Cluster operation support and integration with tooling,

advanced configuration (HDFS, Swift, etc.)

Phase 3"Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Page 16: Savanna - Elastic Hadoop on OpenStack

Phase 2 - Advanced Configuration

● Hadoop cluster configuration support:○ Solutions for HDFS data reliability issue○ Configurable DN storage location○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes○ More Hadoop parameters

● Integration with vendor deployment/management tooling

● Basic monitoring support

Page 17: Savanna - Elastic Hadoop on OpenStack

Roadmap for Hadoop in Cloud

Phase 1 [Released - April, 10]Basic cluster provisioning of Apache Hadoop

Phase 2 [In progress - July 15]Cluster operation support and integration with tooling,

advanced configuration (HDFS, Swift, etc.)

Phase 3"Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Page 18: Savanna - Elastic Hadoop on OpenStack

Phase 3 - Analytics as a Service

● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR)

● User-friendly UI for ad-hoc analytics queries based on Hive or Pig

Page 19: Savanna - Elastic Hadoop on OpenStack

Roadmap for Hadoop in Cloud

Phase 1 [Released - April, 10]Basic cluster provisioning of Apache Hadoop

Phase 2 [In progress - July 15]Cluster operation support and integration with tooling,

advanced configuration (HDFS, Swift, etc.)

Phase 3 [Planned - October 15]"Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Page 20: Savanna - Elastic Hadoop on OpenStack

Further Roadmap

● Autoscaling● HA for NameNode● Deeper HDFS and Swift integration

○ Caching of Swift data on HDFS● Integration with logging and error handling● HBase support

Page 21: Savanna - Elastic Hadoop on OpenStack

● Savanna Overview● Savanna Use Cases● Roadmap & Current Status● Architecture & Features Overview● Hadoop vs. Virtualization

Agenda

Page 22: Savanna - Elastic Hadoop on OpenStack

Architecture Overview

Savanna Python Client RE

ST A

PI Cluster Configuration

Manager

Horizon

Keystone

Auth

DAL

Nova

Glance

Swift

Savanna Pages

HadoopVM

Provisioning Plugin

HadoopVM

HadoopVM

HadoopVM

Instance Interop Helper

ImageRegistry

Page 23: Savanna - Elastic Hadoop on OpenStack

● HDFS Reliability● Data Persistence● I/O Performance● etc.

Hadoop vs. Virtualization

Page 24: Savanna - Elastic Hadoop on OpenStack

● HDFS Reliability● Data Persistence● I/O Performance● etc.

Hadoop vs. Virtualization

Page 25: Savanna - Elastic Hadoop on OpenStack

● HDFS Reliability● Data Persistence● I/O Performance● etc.

Hadoop vs. Virtualization

Page 26: Savanna - Elastic Hadoop on OpenStack

● HDFS Reliability● Data Persistence● I/O Performance● etc.

Hadoop vs. Virtualization

Page 27: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: the issue

Compute

DN DN

DN

DN DN

DN

Data Block

Compute

Page 28: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: the issue

Compute

DN DN

DN

DN DN

DN

Data Block

Compute

Page 29: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: the issue

Compute

DN DN

DN

DN DN

DN

Data Block

Compute

Page 30: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: single DN per host

DN

Compute

TT | DN

Compute

DN

Compute

DN

Cluster A Cluster B

Page 31: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: Hadoop-8468hypervisor-awareness for HDFS scheduler

DN

Compute

DN DN

Compute

DN DN

Compute

DN

HDFSData Block

Page 32: Savanna - Elastic Hadoop on OpenStack

HDFS Reliability: Hadoop-8545enables Swift for Hadoop

Swift

HadoopJob #1

HDFSHadoopJob #2

...HadoopJob #N

initial input

final output

Page 33: Savanna - Elastic Hadoop on OpenStack

● Master node(s)

● Worker nodes

Configurable topology of DN, NN, TT, JT

JT | NN JT NN+

TTTT | DN DN

10 6 8

Page 34: Savanna - Elastic Hadoop on OpenStack

HDFS Placement Options

● Ephemeral drive/var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral

● Block storage volumeCinder Volume -> /mnt/volume

● Bare hard drive support/dev/sdb -> /mnt/sdb

Page 35: Savanna - Elastic Hadoop on OpenStack

Q&A

Page 36: Savanna - Elastic Hadoop on OpenStack

We are hiring!

Page 37: Savanna - Elastic Hadoop on OpenStack

Phase 1 deployment mechanism

HadoopVM

HadoopVM

HadoopVM

HadoopVM

Savanna

Provision VMs withpre-installed Hadoop

Configure HadoopCluster

Page 38: Savanna - Elastic Hadoop on OpenStack

Tool usage scenarios

HadoopVM

HadoopVM

HadoopVM

HadoopVM

ToolManage Hadoop Cluster

VMVM

VM VMTool

Provision & Manage Hadoop Cluster

Scenario I

Scenario II

Page 39: Savanna - Elastic Hadoop on OpenStack

Extensible Provisioning

● get extra configs● validate input● launch/terminate cluster● add/remove nodes

● launch/terminate VMs● get VM status● ssh/scp to VM

Instance Interop

● register image in Savanna

● add/remove tags● get image by tag

Image registry

PluginSavanna

Page 40: Savanna - Elastic Hadoop on OpenStack

get extra parameters

add/remove nodes

Provisioning Interaction

launch cluster

launch cluster

get extra parametersfor the plugin

Savanna

User

Plugin

validate cluster parameters

add/remove nodes

launch cluster

add/remove nodes

Page 41: Savanna - Elastic Hadoop on OpenStack

Provisioning: Launching a Cluster

launch VMs

PLUGIN

ImageRegistry

Instance Interop Helper

get imageby tag

launch VMs

install andconfigureHadoop

HadoopVM

HadoopVM

HadoopVM

HadoopVM

passcommandsvia ssh, scp

Page 42: Savanna - Elastic Hadoop on OpenStack

Q&A

Page 43: Savanna - Elastic Hadoop on OpenStack

We are hiring!