hong kong openstack summit: savanna - hadoop on openstack

25
Savanna Hadoop on OpenStack Ilya Elterman (Mirantis) Matthew Farrellee (Red Hat) Sergey Lukjanov (Mirantis)

Upload: sergey-lukjanov

Post on 26-Jan-2015

122 views

Category:

Health & Medicine


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

SavannaHadoop on OpenStack

Ilya Elterman (Mirantis)Matthew Farrellee (Red Hat)Sergey Lukjanov (Mirantis)

Page 2: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Agenda

● Savanna Overview● Current state

○ EDP overview○ other features

● Roadmap● Live Demo

Page 3: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Agenda

● Savanna Overview● Current state

○ EDP overview○ other features

● Roadmap● Live Demo

Page 4: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Mission:

To provide the OpenStack community with an open, cutting edge, performant and scalable data

processing stack and associated management interfaces

● provision and operate Hadoop clusters ● schedule and operate Hadoop jobs

OpenStack Data Processing - Savanna

Page 5: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Hadoop - Big Data Platform

Page 6: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

http://www.google.com/trends/explore?q=hadoop+openstack#q=openstack%2C%20hadoop&cmpt=q

Popularity

Hadoop OpenStack

Page 7: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Use Cases

● Self-service provisioning of Hadoop clusters● Utilization of unused compute capacity for

bursty workloads● Run Hadoop workloads in few clicks without

expertise in Hadoop ops

Page 8: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Architecture Overview

Data Sources

Savanna Python Client RE

ST A

PI

Cluster Configuration

Manager

Horizon

Keystone

Auth

Data Access Layer

Swift

Savanna Pages

HadoopVM

Vendors Plugins

HadoopVM

HadoopVM

HadoopVM

Resources Orchestration

Manager

Job Sources Job

Manager

Heat

Nova

Glance

Cinder

Neutron

Trove DB

Page 9: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Savanna Status

● Official incubated OpenStack project● v0.3 released 17 Oct 2013● Supported Hadoop distros:

○ Vanilla Apache Hadoop (reference implementation)○ Hortonworks Data Platform 1.3.x○ Intel Distribution on review○ Cloudera Distribution in blueprint

● Included in OpenStack distros:○ RDO - http://openstack.redhat.com○ Mirantis OpenStack - http://software.mirantis.com

Page 10: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Cluster Provisioning Performance

Page 11: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Page 12: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Agenda

● Savanna Overview● Current state

○ EDP overview○ other features

● Roadmap● Live Demo

Page 13: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

● End users have data and questions○ The data lives in a data repository○ The questions are embodied in code

● Savanna Elastic Data Processing (EDP) brings the Hadoop ecosystem to the end user○ Hides all cluster management behind the scenes

EDP Overview

Page 14: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

EDP

“Customers launch millions of Amazon EMR clusters every

year.”http://aws.amazon.com/elasticmapreduce/

Page 15: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

EDP

● Variety and depth of value add offerings on top of clouds are growing

● Offerings are rarely open, rarely allow for choice● Examples - Google Cloud, Azure, AWS

Page 16: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

EDP

Savanna and EDP can both match and exceed use cases provided by most

public clouds

Page 17: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

EDP in Savanna v0.3

● UI, integrated into Horizon, for ad-hoc analytics queries based on Hive or Pig

● API to execute MapReduce jobs without exposing details of underlying infrastructure

● Pluggable data sources: Swift● Supported job types: Jar, Pig, Hive● Integration with Oozie for workflow management

Page 18: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Agenda

● Savanna Overview● Current state

○ EDP overview○ other features

● Roadmap● Live Demo

Page 19: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Cluster Ops in Savanna 0.3

● REST API● Configuration templates● Manual cluster scaling● Data node anti-affinity and location control● Full support of data locality - rack and 4-level

awareness for HDFS and Swift● Swift integration

Page 20: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

OpenStack Integration in Savanna 0.3

● OpenStack Dashboard plugin● Both Neutron and Nova Network support● Keystone trusts used for async operations● Python client

Page 21: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Agenda

● Savanna Overview● Current state

○ EDP overview○ other features

● Roadmap● Live Demo

Page 22: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Live Demo

Page 23: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Icehouse Roadmap

● Integration with OpenStack ecosystem○ Heat○ Tempest○ Devstack○ Ceilometer○ Ironic

● EDP enhancements● Code hardening● Polished api v2● Performance testing

Page 24: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Design Summit Sessions

Friday, November 8● 1:30pm Network and installation topologies● 2:20pm Heat integration and scalability● 3:10pm Further OpenStack integration● 4:10pm Savanna in Icehouse

http://goo.gl/2iEv8u

Page 25: Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack

Q&A