1 community 1.3.0 (optimize both yarn & non yarn hadoop clusters)

17
1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

Upload: buddy-marshall

Post on 21-Dec-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

1

Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

Page 2: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

2

Agenda

• Big Data Trends

• What is Jumbune?

• Description of Components

Page 3: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

3

Big Data Trends

Resource sharing/isolation frameworks: Yarn, Mesos,

etc.Shared cluster workers (resources)

Multiple Execution engines: MapReduce, Spark, Hama,

Storm, Giraph, etc.

Data ETLing from all possible sources to Data

Lake

Page 4: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

4

Hadoop based solution life stages (as on ground) – Cyclic execution

xxxxxx

Business User Data Analyst MapReduce Dev Logic & Data Test

DevopsStaging DataProduction

Bad Logic?

Resource Utilization ?

Bad Data?

Monitoring Needs

Page 5: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

5

5

Challenges in Analytical Solutions

1. No common platform across actors to detect

root causes

2. Incremental imports may

ingest bad data

3. Cluster resources are shared and

optimal utilization is key

4. Implementing models in custom

MR in initial attempts is like

hitting bull’s eye

5. Bad Logic or Bad data

Page 6: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

6

Intersecting solution Lifecycle Stages

xxxxxx

Solution Development Quality Test

DevopsBulk & Incremental Data

Page 7: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

7

Jumbune

Flow AnalyzerData Validation Cluster Monitor Job Profiler

“A catalyst to accelerate realization of analytical solutions”

Page 8: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

8

Niche offerings

• In depth code level analysis of cluster wide flow

• Record level data violation reports.

• No deployment on Workers - Ultra light agent installation on Hadoop master only

• Ability to turn on/off cluster monitoring at will – lessens resource load

• Customizable rack aware monitoring

• Correlated profiling analysis of phases, throughput and resource consumption

• Ability to work across all Hadoop Distributions

Page 9: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

9

Components - Recommended Environments

Dev• Flow

Debugger• Data

Validation• MR Job

Profiler

QA• Data

Validation

Stage + Perf• MR Job

Profiler

Prod• Cluster

Monitoring• Data

Validation

Page 11: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

11

MapReduce Flow Debugger

• Verifies the flow of input records in user’s map reduce implementation

• Drill down visualization helps developer to quickly identify the problem.

• Only tool to assist developers to figure out MapReduce implementation faults without any extra coding

Page 12: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

12

Data Validator

• Validates inconsistencies in data in the form of :– Null checks– Data type checks– Regular expression checks

• Generic way of specifying validation rules

• Provides record level report for found anomalies

• Currently supports HDFS as the lake file system

Page 13: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

13

MR Job Profiling

• Per Job Phase wise – performance for each JVM– data flow rate– Resource usage

• Per Job Heap sites for Mapper & Reducer

• Per Job CPU cycles for Mapper & Reducer

Page 14: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

14

Hadoop Cluster Monitoring

• Data Centre & Rack aware nodes view of Yarn and Non Yarn Daemons

• Dynamic Interval based monitoring

• Hadoop JMX, Node Resource Statistics

• Per file, node wise replica Placement (which nodes have replicas of a given file ?)

• HDFS data placement view (HDFS balanced ?)

Page 16: 1 Community 1.3.0 (Optimize both Yarn & Non Yarn Hadoop clusters)

16

Let’s Collaborate

Website• http://jumbune.org

Contribute• http://github.com/impetus-opensource/jumbune• http://jumbune.org/jira/JUM

Social• Follow @jumbune Use #jumbune• Jumbune Group: http://linkd.in/1mUmcYm

Forums• Users: [email protected] • Dev: [email protected]• Issues: [email protected]

Downloads• http://jumbune.org• https://bintray.com/jumbune/downloads/jumbune