dev ops for big data cluster management tools

22
by Ran Silberman DevOps for Big Data Cluster management tools 20.4.2015 Hosted by: FullStack Developers Israel

Upload: ran-silberman

Post on 15-Jul-2015

260 views

Category:

Data & Analytics


1 download

TRANSCRIPT

by Ran Silberman

DevOps for Big DataCluster management tools

20.4.2015

Hosted by:

FullStack Developers Israel

Ran Silberman,

Big Data Architect

...and amateur birder

● Explain Cluster Management

tools by example

● Demo Cloudera Management

● Pros and Cons

Agenda

Birds of Brazil Wiki application

● Input photos and locations

● Batch: Display statistics on bird,

location & photographer.

● Real-time: Count how many birds

were seen in the last minute from

each species

Application

requirements

● Volume growth

● Velocity of Streaming and Batch

● Same env from DEV to PROD

● Data from PROD to test on DEV

● Manage Deployment of many

applications on many nodes

Big Data lifecycle

considerations

● HDFS for storing the data

● Hive for batch processing

● Solr/elasticsearch for search

● Spark for streaming

● ...Home-grown applications

Choosing the

Infrastructures

Many Infrastructures

How can we manage all those

infrastructures?

● All platforms & infrastructures are

installed by the tool

● Monitoring, Audits & logs are

built-in

● Easy installation and upgrade

● Save scripting work

What are the

news for DevOps

pipeline?

● Manage cluster with GUI or API

● Hadoop installation and setup

● System monitoring & alerts

● Built-in systems: Zookeeper,

Spark, Hive Impala and more

● Ability to add parcels

CM features

● Monolithic packages

● Relocatable

● sudo-less installs

● Rolling upgrade

Parcels

Custom Service Descriptors● CSD is a descriptor for a service

used by CM

● Defines how to install start/stop

a service and the logic used by

CM

CSD

Demo

● Archive data in Hadoop

● Growing data affects DWH

performance & capabilities

● Creating realistic testing data

● Dev and Prod env. may differ in

cluster size (dev may be 1 node)

More DevOps

considerations

Tools Comparison

CM Ambari

Licence Paid Ent edition Free Apache Open Source

Technology Cloudera puppet, ganglia, nagios

Dependency CDH HDP

Manage cluster Parcels Yum

REST API + +

Extra Features Rolling Upgrade, 3rd-

parties Mngt,

Extendable by REST API

CM features

Express Enterprise

Subscription Free Annual

Deployment &

Configuration

+ +

Management + +

Monitoring + +

Diagnostic + +

Extra Features Reports, Rollbacks, Rolling

Upgrade, AD Kerberos, Kerberos

wizard, Backup & DR

● Fast Deploy

● Easy management by GUI

● Built in monitoring and alerts

● Simple upgrades

● Same management and deploy

in Dev and Prod

Pros. of Hadoop

Management

tools

● Tied to specific vendor

proprietary system

● Tied to system version by

Parcels

● Less flexibility to low-level

management

Cons. of Hadoop

Management

tools

THANK YOU

Ran SilbermanEmail: [email protected]