the state of the art for openstack data processing (hadoop on openstack) - atlanta

40
The State of OpenStack Data Processing: Sahara, Now and in Juno Sergey Lukjanov (Mirantis) Matthew Farrellee (Red Hat) John Speidel (Hortonworks)

Upload: spinningmatt

Post on 26-Jan-2015

108 views

Category:

Software


0 download

DESCRIPTION

Update on Sahara as of the OpenStack Icehouse release

TRANSCRIPT

Page 1: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

The State of OpenStackData Processing: Sahara,Now and in Juno

Sergey Lukjanov (Mirantis)Matthew Farrellee (Red Hat)John Speidel (Hortonworks)

Page 2: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

• Sahara overview• Icehouse release• HDP plugin updates• Juno plans

Page 3: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

• Sahara overview• Icehouse release• HDP plugin updates• Juno plans

Page 4: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

OpenStack Data Processing: Sahara

Mission: To provide a scalable data processing stack and associated management interfaces.

• provision and operate Hadoop clusters • schedule and operate Hadoop jobs

Page 5: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Hadoop - Big Data Platform

© http://hortonworks.com/hadoop/yarn/

Page 6: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Trends

http://www.google.com/trends/

Page 7: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Use cases

• Self-service provisioning of Hadoop clusters• Utilization of unused compute capacity for

bursty workloads• Dev -> Stage -> Prod lifecycle• Run Hadoop workloads in few clicks without

expertise in Hadoop ops

Page 8: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Architecture overview

Data Sources

Savanna Python Client RE

ST A

PI

Cluster Configuration

Manager

Horizon

Keystone

Auth

Data Access Layer

Swift

Savanna Pages

HadoopVM

Vendors Plugins

HadoopVM

HadoopVM

HadoopVM

Resources Orchestration

Manager

Job Sources Job

Manager

Heat

Nova

Glance

Cinder

Neutron

Trove DB

Page 9: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Sahara status

• Official integrated OpenStack project• Supported Hadoop distros:

• Vanilla Apache Hadoop• Hortonworks Data Platform• Intel Distribution• Cloudera Distribution in blueprint

• Included into OpenStack distros:• RDO - openstack.redhat.com• Mirantis OpenStack - software.mirantis.com

Page 10: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Contributors

Page 11: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

• Sahara overview• Icehouse release• HDP plugin updates• Juno plans

Page 12: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

142 bugs fixed

Page 13: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

57 blueprints

Page 14: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

32 people

Page 15: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

Standard process

Page 16: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

Dozens more in the client!

Page 17: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

Tempest helps us manage our API

Page 18: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

Sahara easily deployed with DevStack

Page 19: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

Hadoop 2 available via all plugins

© http://hortonworks.com/hadoop/yarn/

Page 20: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Icehouse release

• HBase (and Sqoop) available via HDP plugin• Spark images w/ diskimage-builder (full plugin in review)

• Heat for provisioning• i18n translation started• Neutron namespaces w/ rootwrap• Guest agent implementation started

Page 21: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Elastic Data Processing (EDP) is Sahara’s take on data processing workflow management.

Goal - let end users (those w/ high value questions to answer) get answers about data without having to know a single thing about cluster management.

“Customers launch millions of Amazon EMR clusters every year.”http://aws.amazon.com/elasticmapreduce/

Elastic Data Processing update

Page 22: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Elastic Data Processing update

Available with the Hortonworks Data Platform plugin

Page 23: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Elastic Data Processing update

Support for external HDFS

Page 24: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Elastic Data Processing update

MapReduce.Streaming

and Java actions

Page 25: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Elastic Data Processing update

Job relaunch, with new data and parameters

Page 26: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Command line interface overview

If you can do it with the Dashboard, you can do it from the command-line

Blueprint: python-savannaclient-cli

Page 27: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Command line interface overview

Image management

$ sahara...Positional arguments: <subcommand>

image-add-tag Add a tag to an image.image-list Print a list of available images.image-register Register an image from the Image index.image-remove-tag Remove a tag from an image.image-show Show details of an image.image-unregister Unregister an image.

Page 28: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Command line interface overview

Node group, cluster and job templates$ sahara

node-group-template-create Create a node group...node-group-template-delete Delete a node group...node-group-template-list Print a list of available...node-group-template-show Show details of a node...cluster-template-create Create a cluster template.cluster-template-delete Delete a cluster template.cluster-template-list Print a list of available...cluster-template-show Show details of a cluster...job-template-create Create a job template.job-template-delete Delete a job template.job-template-list Print a list of job...job-template-show Show details of a job...

Page 29: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Command line interface overview

Data sources and job binaries$ sahara... <subcommand> data-source-create Create a data source that provides job input receives job output. data-source-delete Delete a data source. data-source-list Print a list of available data... data-source-show Show details of a data source. job-binary-create Record a job binary. job-binary-delete Delete a job binary. job-binary-list Print a list of job binaries. job-binary-show Show details of a job binary.

Page 30: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Command line interface overview

Clusters and jobs$ sahara... <subcommand>

cluster-create Create a cluster.cluster-delete Delete a cluster.cluster-list Print a list of available clusters.cluster-show Show details of a cluster.job-createjob-delete Delete a job.job-list Print a list of jobs.job-show Show details of a job.

Page 31: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

• Sahara overview• Icehouse release• HDP plugin updates• Juno plans

Page 32: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

HDP Plugin Overview

• Full support for all Sahara Functionality• Nova and Neutron network• Cluster Scaling• Scale Up• Swift Integration• Cinder Support• Data Locality• EDP

• Apache Ambari REST API’s used for clusterprovisioning

• Monitoring/Management of clusters via Ambari• Full support for multiple HDP stacks• HDP pre-installed or generic VM images

Page 33: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

HDP 1.3.2● NameNode● Secondary NameNode● DataNode● HDFS● ZooKeeper ● Ambari Server/Agent● HCatalog● Sqoop● Job Tracker● Task Tracker● MapReduce● Hive● MySQL● Pig● WebHCat Server● Oozie● Ganglia● Nagios● HBase

HDP Plugin Stack Support

HDP 2.0.6● History Server● MapReduce 2 / YARN● Resource Manager● YARN Client

HDP 2.1● Storm● Falcon

Coming Soon!

Availa

ble

Availa

ble

HDP 2.1 +● SOLR● Cascading

Roadmap

Page 34: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

HDP Disk Images

• Disk Image Builder offers consistent approach for image creation• HDP Plugin provides images and scripts for (CentOS, RHEL):

• Plain • 1.3.2• 2.0.6• 2.1 (coming soon)

• Pre-Packaged images (1.3.2, 2.0.6) provide images with HDP packages pre-installed for accelerated provisioning, reduced network traffic

• Image Build Scripts allow images to be customized• Security• Custom Packages• O/S Settings

Page 35: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Ambari Blueprints• Two primary goals of Ambari Blueprints

• Ability to export a complete description of a running cluster

• Provide API based cluster installations based on a self- contained cluster description

• Blueprints contain cluster topology and configuration information

• Enables Interesting use cases between physical and virtual, including OpenStack/Sahara

Page 36: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

• Sahara overview• Icehouse release• HDP plugin updates• Juno plans

Page 37: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Juno roadmap

• Further integration with OpenStack ecosystem:• Distributed architecture• Guest agents• EDP enhancements• Merge dashboard to Horizon

To be discussed and confirmed at Design Summit

Page 38: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Design Summit Sessions

7 Sessions: Thursday 1:30 - Friday 10:30

http://goo.gl/lQXtUS

Page 39: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Agenda

Q&A

Page 40: The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - Atlanta

Cluster and EDP workflowsRarely

Infrequently

Occasionally

Commonly

Occasionally

Frequently