prdc2012

Method for Monitoring and Profiling of Hadoop using AspectJ

Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane

Graduate School of Natural Science & Technology,

Kanazawa University

PRDC2012@TOKIMESSE

Introduction

Large-scale Distributed System is ...

“Flexible and available architecture for large scale computation and data processing on a

network of commodity hardware”[-- P. Julio, 2009]

- e.g. Apache Hadoop

The use scene of Large-scale Distributed Systems is increasing

For Dependable Distributed System ..

We have to consider about and deal with ...

- Non-deterministic network

- Fault tolerance

- Incomprehensible users

We also need runtime monitoring and analysis

Only using advance and static analysis or verification is difficult

How to monitor and debug

General method of debugging or monitoring the Hadoop is ...

• logging text messages

• checking metrics via Web Interfaces, Ganglia, etc..

There are difficulties and requirements

General method of debugging or monitoring the Hadoop is ...

• logging text message

→ Difficulties by a huge number of nodes

• checking metrics via Web Interfaces, Ganglia, etc..

→ For operators, not enough to developers

Introduction

- Provide effective information for development

- Help developers to understand system behaviors and specifications

Proposal

1. The Method Level Monitor

2. The Adaptive Profiling

Outline of Talk

Introduction

- Distributed system’s difficulty

Proposal

- Monitor

- Profile Method

Experimental Results & Conclusion

2. PROPOSALS

The Runtime Monitor

The Adaptive Profiling Method

Outline of Proposed System

Hadoop

•MapReduce

•HDFS

•RPC

Profile

Count up frequencyof instruction

Monitor

Record Traceusing AspectJ

Monitor

• observe the system behavior at runtime

• logging executed instructions passively = make “Trace”

‣ using AspectJ

- “AspectJ is implementation of “Aspect Oriented Programming” using Java “

‣ no modification is needed to applications

Architecture of Hadoop & Monitor

NameNode

JobTracker

Reduce

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

DataNode

TaskTracker

DataBlocks

Monitor

NameNode

JobTracker

Reduce

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

DataNode

TaskTracker

DataBlocks

Monitor

NameNode

JobTracker

Reduce

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

DataNode

TaskTracker

DataBlocks

Monitor

Master’s Trace‣NameNode Trace‣JobTracker Trace‣RPC Trace

Slaves’ Trace‣DataNode Trace‣TaskTracker Trace‣RPC Trace

Method of Profiling

• based on frequency of instructions

• count up instructions involved in “Trace”

• count up on each grain

➡ each node

➡ each process

➡ each method

Outline of Talk

Introduction

- Distributed system’s difficulty

Proposal

- Monitor

- Profile Method

Experimental Results & Conclusion

3. EXPERIMENT

Benchmark on the impact of the Monitor

do Profiling

Visualize the profiling results

Benchmark - the impact of Monitor

use “terasort” - a sample sorting program using MapReduce

Trace size increase by 6.43 KB/sec

Data size[GB] Monitor Elapsed time

[sec]Throughput[MB/sec]

Trace size[MB]

1 ◯ 2m 25s (145sec) 6.9 2.4

1 × 2m 2s (122s) 8.2 0

10 ◯ 8m 45s (525sec) 19.0 3.6

10 × 7m 45s (465sec) 21.5 0

100 ◯ 1h 21m 54s(4,914sec) 20.4 31.6

100 ×1h 18m 37s(4,717sec)

21.2 096.2%

Throughput [MB/sec] = Data size / Elapsed time

A Part of Profiling

Tue Nov 13 12:30:08 JST 2012from 1352777408766 until 10000 afterHOSTNAME ::> DAEMON & PROCESS = { METHODS }--------------------------sirius:177 ::>> [namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0, tasktrackertrace : 0, rpctrace : 113] ={! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5! hdfs.server.namenode.FSNamesystem.getBlockLocations=3! hdfs.server.namenode.FSNamesystem.getDatanode=1! hdfs.server.namenode.NameNode.getBlockLocations=4! hdfs.server.namenode.NameNode.getFileInfo=2! hdfs.server.namenode.NameNode.sendHeartbeat=2! hdfs.server.namenode.NameNode.verifyVersion=3! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1! ipc.Client.Connection.PingInputStream.read=4! ipc.Client.Connection.sendParam=2! ipc.Client.call=1! ipc.ConnectionHeader.readFields=4

the statistics of the last 10 seconds, about master

Node Level Profiling

time(s)

r of o

ccurrenc

192.168.1.10 192.168.1.11192.168.1.12 192.168.1.13192.168.1.14 192.168.1.15

Node Level Profiling is -- profiling by aggregating frequencies of instruction within each node for per unit time.

Process Level Profiling about MASTER

namenodejobtrackerrpc

r of o

ccurrenc

time(s) 6420

Master

Process Level Profiling is -- profiling by aggregating frequencies of instruction of each process within each node for per unit time.

200192.168.1.15

200192.168.1.14

150192.168.1.13

datanodetracetasktrackertracerpctrace

200192.168.1.12

200192.168.1.11

r of o

ccurrenc

time(s)6420

Process Level Profiling about Slaves

Map phase Reduce phase

Imbalance of RPC

There are free resouces.should do

speculative executions.

Conclusion summary

• Proposal

- the lightweight method-level monitor using AspectJ

- the profiling method based on frequency of instruction

• Provide effective information for development

• Help developers to understand system behaviors and specifications

future work• Create an algorithm for determining the degree of deviation

using a profiling results indicate the possibility of failure.

Thank you for your kind attention

prdc2012

method of proling

method level monitor

proling results

adaptive proling method

masterprocess level

lightweight methodlevel

data processing

hasnext hdfs

Documents

personality development

explore the levels of creation

acetone peroxide

basic buffer overflows explained

the best american humorous short stories

european colinization of latin america

heidegger kritik

do you admire leonardo da vinci?

how computer keyboards work

star wars trivia!

iron mills essay

the dutch republic in international trade

star wars original trilogy trivia (episodes iv-vi)

oedipus the king: the ideal tragic play

compressing and decompressing folders

keyboard shortcuts for the opera browser for mac os x

cakes recipes

rubik's cube solution

bodybuilding - the rock hard challenge (month 1 training)

effective parenting: establishing boundaries