prdc2012

23
Method for Monitoring and Profiling of Hadoop using AspectJ Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane Graduate School of Natural Science & Technology, Kanazawa University PRDC2012@TOKIMESSE

Upload: yusuke-shimizu

Post on 25-May-2015

278 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prdc2012

Method for Monitoring and Profiling of Hadoop using AspectJ

Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane

Graduate School of Natural Science & Technology,

Kanazawa University

PRDC2012@TOKIMESSE

Page 2: Prdc2012

Introduction

Large-scale Distributed System is ...

“Flexible and available architecture for large scale computation and data processing on a

network of commodity hardware”[-- P. Julio, 2009]

- e.g. Apache Hadoop

The use scene of Large-scale Distributed Systems is increasing

Page 3: Prdc2012

For Dependable Distributed System ..

We have to consider about and deal with ...

- Non-deterministic network

- Fault tolerance

- Incomprehensible users

We also need runtime monitoring and analysis

Only using advance and static analysis or verification is difficult

Page 4: Prdc2012

How to monitor and debug

General method of debugging or monitoring the Hadoop is ...

• logging text messages

• checking metrics via Web Interfaces, Ganglia, etc..

Page 5: Prdc2012

There are difficulties and requirements

General method of debugging or monitoring the Hadoop is ...

• logging text message

→ Difficulties by a huge number of nodes

• checking metrics via Web Interfaces, Ganglia, etc..

→ For operators, not enough to developers

Page 6: Prdc2012

Introduction

- Provide effective information for development

- Help developers to understand system behaviors and specifications

Proposal

1. The Method Level Monitor

2. The Adaptive Profiling

Page 7: Prdc2012

Outline of Talk

Introduction

- Distributed system’s difficulty

Proposal

- Monitor

- Profile Method

Experimental Results & Conclusion

Page 8: Prdc2012

2. PROPOSALS

The Runtime Monitor

&

The Adaptive Profiling Method

Page 9: Prdc2012

Outline of Proposed System

Hadoop

•MapReduce

•HDFS

•RPC

Profile

Count up frequencyof instruction

Monitor

Record Traceusing AspectJ

Page 10: Prdc2012

Monitor

• observe the system behavior at runtime

• logging executed instructions passively = make “Trace”

‣ using AspectJ

- “AspectJ is implementation of “Aspect Oriented Programming” using Java “

‣ no modification is needed to applications

Page 11: Prdc2012

Architecture of Hadoop & Monitor

NameNode

JobTracker

Reduce

Map

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

Map

DataNode

TaskTracker

DataBlocks

Monitor

Monitor

RPC

RPC

Page 12: Prdc2012

Architecture of Hadoop & Monitor

NameNode

JobTracker

Reduce

Map

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

Map

DataNode

TaskTracker

DataBlocks

Monitor

Monitor

RPC

RPC

Page 13: Prdc2012

Architecture of Hadoop & Monitor

NameNode

JobTracker

Reduce

Map

DataNode

TaskTracker

DataBlocks

Master

Slaves

Monitor

Reduce

Map

DataNode

TaskTracker

DataBlocks

Monitor

Monitor

RPC

Master’s Trace‣NameNode Trace‣JobTracker Trace‣RPC Trace

Slaves’ Trace‣DataNode Trace‣TaskTracker Trace‣RPC Trace

RPC

Page 14: Prdc2012

Method of Profiling

• based on frequency of instructions

• count up instructions involved in “Trace”

• count up on each grain

➡ each node

➡ each process

➡ each method

Page 15: Prdc2012

Outline of Talk

Introduction

- Distributed system’s difficulty

Proposal

- Monitor

- Profile Method

Experimental Results & Conclusion

Page 16: Prdc2012

3. EXPERIMENT

Benchmark on the impact of the Monitor

&

do Profiling

&

Visualize the profiling results

Page 17: Prdc2012

Benchmark - the impact of Monitor

use “terasort” - a sample sorting program using MapReduce

Trace size increase by 6.43 KB/sec

Data size[GB] Monitor Elapsed time

[sec]Throughput[MB/sec]

Trace size[MB]

1 ◯ 2m 25s (145sec) 6.9 2.4

1 × 2m 2s (122s) 8.2 0

10 ◯ 8m 45s (525sec) 19.0 3.6

10 × 7m 45s (465sec) 21.5 0

100 ◯ 1h 21m 54s(4,914sec) 20.4 31.6

100 ×1h 18m 37s(4,717sec)

21.2 096.2%

Throughput [MB/sec] = Data size / Elapsed time

88.3%

84.1%

Page 18: Prdc2012

A Part of Profiling

Tue Nov 13 12:30:08 JST 2012from 1352777408766 until 10000 afterHOSTNAME ::> DAEMON & PROCESS = { METHODS }--------------------------sirius:177 ::>> [namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0, tasktrackertrace : 0, rpctrace : 113] ={! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5! hdfs.server.namenode.FSNamesystem.getBlockLocations=3! hdfs.server.namenode.FSNamesystem.getDatanode=1! hdfs.server.namenode.NameNode.getBlockLocations=4! hdfs.server.namenode.NameNode.getFileInfo=2! hdfs.server.namenode.NameNode.sendHeartbeat=2! hdfs.server.namenode.NameNode.verifyVersion=3! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1! ipc.Client.Connection.PingInputStream.read=4! ipc.Client.Connection.sendParam=2! ipc.Client.call=1! ipc.ConnectionHeader.readFields=4

the statistics of the last 10 seconds, about master

Page 19: Prdc2012

Node Level Profiling

0

160

320

480

640

800

time(s)

numbe

r  of  o

ccurrenc

es

6420

192.168.1.10 192.168.1.11192.168.1.12 192.168.1.13192.168.1.14 192.168.1.15

Node Level Profiling is -- profiling by aggregating frequencies of instruction within each node for per unit time.

Page 20: Prdc2012

Process Level Profiling about MASTER

0

100

200

300

400

namenodejobtrackerrpc

numbe

r  of  o

ccurrenc

es

time(s) 6420

Master

Process Level Profiling is -- profiling by aggregating frequencies of instruction of each process within each node for per unit time.

Page 21: Prdc2012

50

100

150

200192.168.1.15

50

100

150

200192.168.1.14

38

75

113

150192.168.1.13

datanodetracetasktrackertracerpctrace

50

100

150

200192.168.1.12

0

50

100

150

200192.168.1.11

numbe

r  of  o

ccurrenc

es

time(s)6420

Process Level Profiling about Slaves

Map phase Reduce phase

Imbalance of RPC

There are free resouces.should do

speculative executions.

Page 22: Prdc2012

Conclusion summary

• Proposal

- the lightweight method-level monitor using AspectJ

- the profiling method based on frequency of instruction

• Provide effective information for development

• Help developers to understand system behaviors and specifications

future work• Create an algorithm for determining the degree of deviation

using a profiling results indicate the possibility of failure.

Page 23: Prdc2012

Thank you for your kind attention