prdc2012
Post on 25-May-2015
279 Views
Preview:
TRANSCRIPT
Method for Monitoring and Profiling of Hadoop using AspectJ
Yusuke Shimizu, Kouhei Sakurai, Satoshi Yamane
Graduate School of Natural Science & Technology,
Kanazawa University
PRDC2012@TOKIMESSE
Introduction
Large-scale Distributed System is ...
“Flexible and available architecture for large scale computation and data processing on a
network of commodity hardware”[-- P. Julio, 2009]
- e.g. Apache Hadoop
The use scene of Large-scale Distributed Systems is increasing
For Dependable Distributed System ..
We have to consider about and deal with ...
- Non-deterministic network
- Fault tolerance
- Incomprehensible users
We also need runtime monitoring and analysis
Only using advance and static analysis or verification is difficult
How to monitor and debug
General method of debugging or monitoring the Hadoop is ...
• logging text messages
• checking metrics via Web Interfaces, Ganglia, etc..
There are difficulties and requirements
General method of debugging or monitoring the Hadoop is ...
• logging text message
→ Difficulties by a huge number of nodes
• checking metrics via Web Interfaces, Ganglia, etc..
→ For operators, not enough to developers
Introduction
- Provide effective information for development
- Help developers to understand system behaviors and specifications
Proposal
1. The Method Level Monitor
2. The Adaptive Profiling
Outline of Talk
Introduction
- Distributed system’s difficulty
Proposal
- Monitor
- Profile Method
Experimental Results & Conclusion
2. PROPOSALS
The Runtime Monitor
&
The Adaptive Profiling Method
Outline of Proposed System
Hadoop
•MapReduce
•HDFS
•RPC
Profile
Count up frequencyof instruction
Monitor
Record Traceusing AspectJ
Monitor
• observe the system behavior at runtime
• logging executed instructions passively = make “Trace”
‣ using AspectJ
- “AspectJ is implementation of “Aspect Oriented Programming” using Java “
‣ no modification is needed to applications
Architecture of Hadoop & Monitor
NameNode
JobTracker
Reduce
Map
DataNode
TaskTracker
DataBlocks
Master
Slaves
Monitor
Reduce
Map
DataNode
TaskTracker
DataBlocks
Monitor
Monitor
RPC
RPC
Architecture of Hadoop & Monitor
NameNode
JobTracker
Reduce
Map
DataNode
TaskTracker
DataBlocks
Master
Slaves
Monitor
Reduce
Map
DataNode
TaskTracker
DataBlocks
Monitor
Monitor
RPC
RPC
Architecture of Hadoop & Monitor
NameNode
JobTracker
Reduce
Map
DataNode
TaskTracker
DataBlocks
Master
Slaves
Monitor
Reduce
Map
DataNode
TaskTracker
DataBlocks
Monitor
Monitor
RPC
Master’s Trace‣NameNode Trace‣JobTracker Trace‣RPC Trace
Slaves’ Trace‣DataNode Trace‣TaskTracker Trace‣RPC Trace
RPC
Method of Profiling
• based on frequency of instructions
• count up instructions involved in “Trace”
• count up on each grain
➡ each node
➡ each process
➡ each method
Outline of Talk
Introduction
- Distributed system’s difficulty
Proposal
- Monitor
- Profile Method
Experimental Results & Conclusion
3. EXPERIMENT
Benchmark on the impact of the Monitor
&
do Profiling
&
Visualize the profiling results
Benchmark - the impact of Monitor
use “terasort” - a sample sorting program using MapReduce
Trace size increase by 6.43 KB/sec
Data size[GB] Monitor Elapsed time
[sec]Throughput[MB/sec]
Trace size[MB]
1 ◯ 2m 25s (145sec) 6.9 2.4
1 × 2m 2s (122s) 8.2 0
10 ◯ 8m 45s (525sec) 19.0 3.6
10 × 7m 45s (465sec) 21.5 0
100 ◯ 1h 21m 54s(4,914sec) 20.4 31.6
100 ×1h 18m 37s(4,717sec)
21.2 096.2%
Throughput [MB/sec] = Data size / Elapsed time
88.3%
84.1%
A Part of Profiling
Tue Nov 13 12:30:08 JST 2012from 1352777408766 until 10000 afterHOSTNAME ::> DAEMON & PROCESS = { METHODS }--------------------------sirius:177 ::>> [namenodetrace : 23, jobtrackertrace : 41, datanodetrace : 0, tasktrackertrace : 0, rpctrace : 113] ={! hdfs.server.namenode.CorruptReplicasMap.numCorruptReplicas=5! hdfs.server.namenode.FSNamesystem.getBlockLocations=3! hdfs.server.namenode.FSNamesystem.getDatanode=1! hdfs.server.namenode.NameNode.getBlockLocations=4! hdfs.server.namenode.NameNode.getFileInfo=2! hdfs.server.namenode.NameNode.sendHeartbeat=2! hdfs.server.namenode.NameNode.verifyVersion=3! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.hasNext=2! hdfs.server.namenode.UnderReplicatedBlocks.BlockIterator.next=1! ipc.Client.Connection.PingInputStream.read=4! ipc.Client.Connection.sendParam=2! ipc.Client.call=1! ipc.ConnectionHeader.readFields=4
the statistics of the last 10 seconds, about master
Node Level Profiling
0
160
320
480
640
800
time(s)
numbe
r of o
ccurrenc
es
6420
192.168.1.10 192.168.1.11192.168.1.12 192.168.1.13192.168.1.14 192.168.1.15
Node Level Profiling is -- profiling by aggregating frequencies of instruction within each node for per unit time.
Process Level Profiling about MASTER
0
100
200
300
400
namenodejobtrackerrpc
numbe
r of o
ccurrenc
es
time(s) 6420
Master
Process Level Profiling is -- profiling by aggregating frequencies of instruction of each process within each node for per unit time.
50
100
150
200192.168.1.15
50
100
150
200192.168.1.14
38
75
113
150192.168.1.13
datanodetracetasktrackertracerpctrace
50
100
150
200192.168.1.12
0
50
100
150
200192.168.1.11
numbe
r of o
ccurrenc
es
time(s)6420
Process Level Profiling about Slaves
Map phase Reduce phase
Imbalance of RPC
There are free resouces.should do
speculative executions.
Conclusion summary
• Proposal
- the lightweight method-level monitor using AspectJ
- the profiling method based on frequency of instruction
• Provide effective information for development
• Help developers to understand system behaviors and specifications
future work• Create an algorithm for determining the degree of deviation
using a profiling results indicate the possibility of failure.
Thank you for your kind attention
top related