a micro-benchmark suite for evaluating hadoop rpc on high-performance … · 2015-07-29 · • to...
TRANSCRIPT
Xiaoyi Lu, Md. Wasi-‐ur-‐Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda
Network-‐Based Compu2ng Laboratory
Department of Computer Science and Engineering The Ohio State University, Columbus, OH, USA
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance
Networks
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
2
WBDB 2013
Big Data Technology • Apache Hadoop is one of the most popular
Big Data technology – Provides framework for large-scale,
distributed data storage and processing • An open-source implementation of
MapReduce programming model • Hadoop Distributed File System (HDFS) is
the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase
• Hadoop Core – Common functionalities, e.g. Remote Procedure Call (RPC)
HDFS
MapReduce HBase
Hadoop Framework
3
Core (RPC, ..)
WBDB 2013
Adoption of Hadoop RPC • Hadoop RPC is increasingly being used with data-center
middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. – Metadata exchange – Manage compute nodes and track system status – Efficient data management operations: get block info, create blocks etc. – Database operations: put, get, etc.
4
High Performance
Networks
(HDD/SSD)
(HDD/SSD)
(HDD/SSD)
...
...
(HDFS Data Nodes)(HDFS Clients)
...
...
(HBase Clients) (HRegion Servers) (Data Nodes)
(HDD/SSD)
(HDD/SSD)
(HDD/SSD)
...
... ...
... ...
...
High Performance
Networks
High Performance
Networks
MapReduce & HDFS HBase
Map/Reduce (HDFS Name Node)
WBDB 2013
Common Protocols using Open Fabrics
5
Applica-on
Verbs Sockets
ApplicaAon Interface
SDP
RDMA
SDP
InfiniBand Adapter
InfiniBand Switch
RDMA
IB Verbs
InfiniBand Adapter
InfiniBand Switch
User space
RDMA
RoCE
RoCE Adapter
User space
Ethernet Switch
TCP/IP
Ethernet Driver
Kernel Space
Protocol
InfiniBand Adapter
InfiniBand Switch
IPoIB
Ethernet Adapter
Ethernet Switch
Adapter
Switch
1/10/40 GigE
iWARP
Ethernet Switch
iWARP
iWARP Adapter
User space IPoIB
TCP/IP
Ethernet Adapter
Ethernet Switch
10/40 GigE-‐TOE
Hardware Offload
RSockets
InfiniBand Adapter
InfiniBand Switch
User space
RSockets
WBDB 2013
Can Big Data Processing Systems be Designed with High-‐Performance Networks and Protocols?
Enhanced Designs
Applica-on
Accelerated Sockets
10 GigE or InfiniBand
Verbs / Hardware Offload
Current Design
Applica-on
Sockets
1/10 GigE Network
• Sockets not designed for high-‐performance – Stream semanAcs oSen mismatch for upper layers (Memcached, HBase, Hadoop) – Zero-‐copy not available for non-‐blocking sockets
Our Approach
Applica-on
OSU Design
10 GigE or InfiniBand
Verbs Interface
6
WBDB 2013
Hadoop RPC over InfiniBand
Hadoop RPC
IB Verbs
InfiniBand
Applica-ons
1/10 GigE, IPoIB Network
Java Socket Interface
Java Na-ve Interface (JNI)
Our Design
Default
OSU Design
Enables high performance RDMA communicaAon, while supporAng tradiAonal socket interface
Xiaoyi Lu, Nusrat Islam, Md. Wasi-‐ur-‐Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. (DK) Panda. “High-‐Performance Design of Hadoop RPC with RDMA over InfiniBand.” To be presented in the 42nd Interna-onal Conference on Parallel Processing (ICPP 2013), Lyon, France, October, 2013. 7
rpc.ib.enabled
WBDB 2013
8
Hadoop RPC over IB: Gain in Latency and Throughput
• Hadoop RPC over IB PingPong Latency
– 1 byte: 39 us; 4 KB: 52 us – 42%-‐49% and 46%-‐50% improvements compared with the performance of default
Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely
• Hadoop RPC over IB Throughput
– 512 bytes & 48 clients: 135.22 Kops/sec – 82% and 64% improvements compared with the peak performance of default
Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely
0
20
40
60
80
100
120
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Latency (us)
Payload Size (Byte)
RPC-‐10GigE RPC-‐IPoIB(32Gbps) RPCoIB(32Gbps)
0
20
40
60
80
100
120
140
160
8 16 24 32 40 48 56 64
Throughp
ut (K
ops/Sec)
Number of Clients
RPC-‐10GigE
RPC-‐IPoIB(32Gbps)
RPCoIB(32Gbps)
WBDB 2013
• High-Performance Design of Hadoop over RDMA-enabled Interconnects
– High performance design with native InfiniBand support at the verbs-level for HDFS, MapReduce, and RPC components
– Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)
– Current release: 0.9.0 • Based on Apache Hadoop 0.20.2
• Compliant with Apache Hadoop 0.20.2 APIs and applications
• Tested with
– Mellanox InfiniBand adapters (DDR, QDR and FDR)
– Various multi-core platforms
– Different file systems with disks and SSDs
– http://hadoop-rdma.cse.ohio-state.edu
Available in Hadoop-RDMA SoSware
9
WBDB 2013
Requirements of Hadoop RPC Benchmarks • To achieve optimal performance, Hadoop RPC needs
to be tuned based on cluster and workload characteristics
• A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding
• For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs
10
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
11
WBDB 2013
Problem Statement • Can we design and implement a simple and standardized
benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks/protocols?
• What will be the performance of Hadoop RPC when evaluated
using this benchmark suite on high-performance networks?
12
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
13
WBDB 2013
Design Considerations
• The performance of RPC systems is usually measured by the metrics of latency and throughput
• Performance of Hadoop RPC is determined by: – Factors related to network configurations; Faster
interconnects and/or protocols can enhance Hadoop RPC performance
– Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc.
– Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc.
– CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance
14
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
15
WBDB 2013
Micro-benchmark Suite • Two different micro-benchmarks:
– Latency: Single Server, Single Client – Throughput: Single Server, Multiple Clients
• A script framework for job launching and resource monitoring
• Calculates statistics like Min, Max, Average
16
Component Network Address
Port Data Type
Min Msg Size
Max Msg Size
No. of Iterations
Handlers Verbose
lat_client √ √ √ √ √ √ √
lat_server √ √ √ √
Component Network Address
Port Data Type
Min Msg Size
Max Msg Size
No. of Iterations
No. of Clients
Handlers Verbose
thr_client √ √ √ √ √ √ √
thr_server √ √ √ √ √ √
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
17
WBDB 2013
Experimental Setup • Hardware
– Intel Westmere Cluster • 8 nodes • Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad-‐core CPUs, 24 GB main memory
• Network: 1GigE, 10GigE, and IPoIB (32Gbps) • SoSware
– Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3
– Hadoop 0.20.2 and Sun Java SDK 1.7. 18
WBDB 2013
RPC Latency for BytesWritable
Small Messages Large Messages
• Latency for RPC decreases if the underlying interconnect is changed to IPoIB or 10 GigE from 1 GigE.
• With 10 GigE interconnect, we observe beher latency than IPoIB for small payload sizes. For large payload sizes, IPoIB performs beher than 10 GigE. – IPoIB achieves 27% gain over 10 GigE for a 64 MB payload size, whereas it performs worse by
0.66% over 10 GigE for a 4 KB payload size. 19
0
50
100
150
200
250
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
0
100
200
300
400
500
600
700
800
128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
Late
ncy
(ms)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
WBDB 2013
RPC Latency for Text
Small Messages Large Messages
• Similar performance characterisAc for RPC latency with the data type of Text.
20
0
20
40
60
80
100
120
140
160
180
200
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
0
100
200
300
400
500
600
700
800
128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
Late
ncy
(us)
Payload Size (Byte)
1GigE
10GigE
IPoIB(32Gbps)
WBDB 2013
RPC Throughput for BytesWritable
7 RPC Server Handlers 16 RPC Server Handlers
• IPoIB performs beher than 10 GigE as payload size is increased.
• At 4 KB, the improvement goes upto 26% for seven handler threads. For small payload sizes, 10 GigE performs beher than IPoIB by an average margin of 5-‐6%.
21
0
5
10
15
20
25
30
35
40
45
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut (K
ops/
Sec)
Payload Size (byte)
1GigE
10GigE
IPoIB(32Gbps) 0
5
10
15
20
25
30
35
40
45
1 2 4 8 16 32 64 128 256 512 1024 2048 4096
Thro
ughp
ut (K
ops/
Sec)
Payload Size (byte)
1GigE
10GigE
IPoIB(32Gbps)
WBDB 2013
RPC Throughput for BytesWritable
CPU utilization for the experiment with 4 handlers Throughput Comparison for 4 KB payload size
• Keep the payload size fixed to 4 KB and observe the trend with different handler numbers and different networks – IPoIB performs beher than 10 GigE as 48%, 5%, 45%, and 47% for 1, 4, 16, and 32 handlers
respecAvely.
• Easily used to monitor resource uAlizaAon. Enable a parameter in the script framework. 22
0 10 20 30 40 50 60 70 80 90
100
1 4 16 32
Thro
ughp
ut (K
ops/
Sec)
Handler Number
1GigE 10GigE IPoIB(32Gbps)
0
5
10
15
20
25
30
35
40
45
0 9 18
27
36
45
54
63
72
81
90
99
108
117
126
135
144
153
162
171
180
189
198
207
216
CPU
Util
izat
ion
(%)
Sampling Point
1GigE 10GigE IPoIB(32Gbps)
WBDB 2013
Outline
• IntroducAon and MoAvaAon
• Problem Statement
• Design ConsideraAons • Micro-‐benchmark Suite
• Performance EvaluaAon
• Conclusion & Future work
23
WBDB 2013
Conclusion and Future Works
• Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC.
• Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types.
• Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB).
• Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers
• Will be made available to the big data community via an open-source release 24
WBDB 2013
Thank You!
{luxi, rahmanmd, islamn, panda}@cse.ohio-‐state.edu
Network-‐Based CompuAng Laboratory hhp://nowlab.cse.ohio-‐state.edu/
MVAPICH Web Page hhp://mvapich.cse.ohio-‐state.edu/
25 Hadoop-‐RDMA Web Page
hhp://hadoop-‐rdma.cse.ohio-‐state.edu/