a micro-benchmark suite for evaluating hadoop rpc on high-performance … · 2015-07-29 · • to...

Xiaoyi Lu, Md. Wasi-‐ur-‐Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda

Network-‐Based Compu2ng Laboratory

Department of Computer Science and Engineering The Ohio State University, Columbus, OH, USA

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance

Networks

WBDB 2013

Outline

•  IntroducAon and MoAvaAon

•  Problem Statement

•  Design ConsideraAons •  Micro-‐benchmark Suite

•  Performance EvaluaAon

•  Conclusion & Future work

2

WBDB 2013

Big Data Technology •  Apache Hadoop is one of the most popular

Big Data technology –  Provides framework for large-scale,

distributed data storage and processing •  An open-source implementation of

MapReduce programming model •  Hadoop Distributed File System (HDFS) is

the underlying file system of Hadoop MapReduce and Hadoop DataBase, HBase

•  Hadoop Core – Common functionalities, e.g. Remote Procedure Call (RPC)

HDFS

MapReduce HBase

Hadoop Framework

3

Core (RPC, ..)

WBDB 2013

Adoption of Hadoop RPC •  Hadoop RPC is increasingly being used with data-center

middlewares such as MapReduce, HDFS, and HBase because of its simplicity, productivity, and high performance. –  Metadata exchange –  Manage compute nodes and track system status –  Efficient data management operations: get block info, create blocks etc. –  Database operations: put, get, etc.

4

High Performance

Networks

(HDD/SSD)

(HDD/SSD)

(HDD/SSD)

...

...

(HDFS Data Nodes)(HDFS Clients)

...

...

(HBase Clients) (HRegion Servers) (Data Nodes)

(HDD/SSD)

(HDD/SSD)

(HDD/SSD)

...

... ...

... ...

...

High Performance

Networks

High Performance

Networks

MapReduce & HDFS HBase

Map/Reduce (HDFS Name Node)

WBDB 2013

Common Protocols using Open Fabrics

5

Applica-on

Verbs Sockets

ApplicaAon Interface

SDP

RDMA

SDP

InfiniBand Adapter

InfiniBand Switch

RDMA

IB Verbs

InfiniBand Adapter

InfiniBand Switch

User space

RDMA

RoCE

RoCE Adapter

User space

Ethernet Switch

TCP/IP

Ethernet Driver

Kernel Space

Protocol

InfiniBand Adapter

InfiniBand Switch

IPoIB

Ethernet Adapter

Ethernet Switch

Adapter

Switch

1/10/40 GigE

iWARP

Ethernet Switch

iWARP

iWARP Adapter

User space IPoIB

TCP/IP

Ethernet Adapter

Ethernet Switch

10/40 GigE-‐TOE

Hardware Offload

RSockets

InfiniBand Adapter

InfiniBand Switch

User space

RSockets

WBDB 2013

Can Big Data Processing Systems be Designed with High-‐Performance Networks and Protocols?

Enhanced Designs

Applica-on

Accelerated Sockets

10 GigE or InfiniBand

Verbs / Hardware Offload

Current Design

Applica-on

Sockets

1/10 GigE Network

•  Sockets not designed for high-‐performance –  Stream semanAcs oSen mismatch for upper layers (Memcached, HBase, Hadoop) –  Zero-‐copy not available for non-‐blocking sockets

Our Approach

Applica-on

OSU Design

10 GigE or InfiniBand

Verbs Interface

6

WBDB 2013

Hadoop RPC over InfiniBand

Hadoop RPC

IB Verbs

InfiniBand

Applica-ons

1/10 GigE, IPoIB Network

Java Socket Interface

Java Na-ve Interface (JNI)

Our Design

Default

OSU Design

Enables high performance RDMA communicaAon, while supporAng tradiAonal socket interface

Xiaoyi Lu, Nusrat Islam, Md. Wasi-‐ur-‐Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. (DK) Panda. “High-‐Performance Design of Hadoop RPC with RDMA over InfiniBand.” To be presented in the 42nd Interna-onal Conference on Parallel Processing (ICPP 2013), Lyon, France, October, 2013. 7

rpc.ib.enabled

WBDB 2013

8

Hadoop RPC over IB: Gain in Latency and Throughput

•  Hadoop RPC over IB PingPong Latency

–  1 byte: 39 us; 4 KB: 52 us –  42%-‐49% and 46%-‐50% improvements compared with the performance of default

Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely

•  Hadoop RPC over IB Throughput

–  512 bytes & 48 clients: 135.22 Kops/sec –  82% and 64% improvements compared with the peak performance of default

Hadoop RPC on 10 GigE and IPoIB (32Gbps) respec-vely

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Latency (us)

Payload Size (Byte)

RPC-‐10GigE RPC-‐IPoIB(32Gbps) RPCoIB(32Gbps)

0

20

40

60

80

100

120

140

160

8 16 24 32 40 48 56 64

Throughp

ut (K

ops/Sec)

Number of Clients

RPC-‐10GigE

RPC-‐IPoIB(32Gbps)

RPCoIB(32Gbps)

WBDB 2013

•  High-Performance Design of Hadoop over RDMA-enabled Interconnects

–  High performance design with native InfiniBand support at the verbs-level for HDFS, MapReduce, and RPC components

–  Easily configurable for both native InfiniBand and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)

–  Current release: 0.9.0 •  Based on Apache Hadoop 0.20.2

•  Compliant with Apache Hadoop 0.20.2 APIs and applications

•  Tested with

–  Mellanox InfiniBand adapters (DDR, QDR and FDR)

–  Various multi-core platforms

–  Different file systems with disks and SSDs

–  http://hadoop-rdma.cse.ohio-state.edu

Available in Hadoop-RDMA SoSware

9

WBDB 2013

Requirements of Hadoop RPC Benchmarks •  To achieve optimal performance, Hadoop RPC needs

to be tuned based on cluster and workload characteristics

•  A micro-benchmark tool suite to evaluate Hadoop RPC performance metrics in different configurations is important for tuning and understanding

•  For Hadoop developers, this kind of micro-benchmark suite is helpful to evaluate and optimize the performance of new designs

10

WBDB 2013

Outline






11

WBDB 2013

Problem Statement •  Can we design and implement a simple and standardized

benchmark suite to let all users and developers in the Big Data community evaluate, understand, and optimize the Hadoop RPC performance over a range of networks/protocols?

•  What will be the performance of Hadoop RPC when evaluated

using this benchmark suite on high-performance networks?

12

WBDB 2013

Outline






13

WBDB 2013

Design Considerations

•  The performance of RPC systems is usually measured by the metrics of latency and throughput

•  Performance of Hadoop RPC is determined by: –  Factors related to network configurations; Faster

interconnects and/or protocols can enhance Hadoop RPC performance

–  Controllable parameters in RPC engine-level and benchmark-level: handler/client number, etc.

–  Data types: serialization and deserialization issues of different data types in RPC system; BytesWritable, Text, etc.

–  CPU Utilization: tradeoff between RPC subsystem performance and the whole system performance

14

WBDB 2013

Outline






15

WBDB 2013

Micro-benchmark Suite •  Two different micro-benchmarks:

–  Latency: Single Server, Single Client –  Throughput: Single Server, Multiple Clients

•  A script framework for job launching and resource monitoring

•  Calculates statistics like Min, Max, Average

16

Component Network Address

Port Data Type

Min Msg Size

Max Msg Size

No. of Iterations

Handlers Verbose

lat_client √ √ √ √ √ √ √

lat_server √ √ √ √

Component Network Address

Port Data Type

Min Msg Size

Max Msg Size

No. of Iterations

No. of Clients

Handlers Verbose

thr_client √ √ √ √ √ √ √

thr_server √ √ √ √ √ √

WBDB 2013

Outline






17

WBDB 2013

Experimental Setup •  Hardware

–  Intel Westmere Cluster • 8 nodes • Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad-‐core CPUs, 24 GB main memory

• Network: 1GigE, 10GigE, and IPoIB (32Gbps) •  SoSware

–  Enterprise Linux Server release 6.1 (Santiago) at kernel version 2.6.32-131 with OpenFabrics version 1.5.3

–  Hadoop 0.20.2 and Sun Java SDK 1.7. 18

WBDB 2013

RPC Latency for BytesWritable

Small Messages Large Messages

•  Latency for RPC decreases if the underlying interconnect is changed to IPoIB or 10 GigE from 1 GigE.

•  With 10 GigE interconnect, we observe beher latency than IPoIB for small payload sizes. For large payload sizes, IPoIB performs beher than 10 GigE. –  IPoIB achieves 27% gain over 10 GigE for a 64 MB payload size, whereas it performs worse by

0.66% over 10 GigE for a 4 KB payload size. 19

0

50

100

150

200

250

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

0

100

200

300

400

500

600

700

800

128K 256K 512K 1M 2M 4M 8M 16M 32M 64M

Late

ncy

(ms)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Latency for Text

Small Messages Large Messages

•  Similar performance characterisAc for RPC latency with the data type of Text.

20

0

20

40

60

80

100

120

140

160

180

200

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

0

100

200

300

400

500

600

700

800

128K 256K 512K 1M 2M 4M 8M 16M 32M 64M

Late

ncy

(us)

Payload Size (Byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Throughput for BytesWritable

7 RPC Server Handlers 16 RPC Server Handlers

•  IPoIB performs beher than 10 GigE as payload size is increased.

•  At 4 KB, the improvement goes upto 26% for seven handler threads. For small payload sizes, 10 GigE performs beher than IPoIB by an average margin of 5-‐6%.

21

0

5

10

15

20

25

30

35

40

45

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut (K

ops/

Sec)

Payload Size (byte)

1GigE

10GigE

IPoIB(32Gbps) 0

5

10

15

20

25

30

35

40

45

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Thro

ughp

ut (K

ops/

Sec)

Payload Size (byte)

1GigE

10GigE

IPoIB(32Gbps)

WBDB 2013

RPC Throughput for BytesWritable

CPU utilization for the experiment with 4 handlers Throughput Comparison for 4 KB payload size

•  Keep the payload size fixed to 4 KB and observe the trend with different handler numbers and different networks –  IPoIB performs beher than 10 GigE as 48%, 5%, 45%, and 47% for 1, 4, 16, and 32 handlers

respecAvely.

•  Easily used to monitor resource uAlizaAon. Enable a parameter in the script framework. 22

0 10 20 30 40 50 60 70 80 90

100

1 4 16 32

Thro

ughp

ut (K

ops/

Sec)

Handler Number

1GigE 10GigE IPoIB(32Gbps)

0

5

10

15

20

25

30

35

40

45

0 9 18

27

36

45

54

63

72

81

90

99

108

117

126

135

144

153

162

171

180

189

198

207

216

CPU

Util

izat

ion

(%)

Sampling Point

1GigE 10GigE IPoIB(32Gbps)

WBDB 2013

Outline






23

WBDB 2013

Conclusion and Future Works

•  Design and implement a micro-benchmark suite to evaluate the performance of standalone Hadoop RPC.

•  Provide standard micro-benchmarks to measure the latency and throughput of Hadoop RPC with different data types.

•  Illustrate the performance results of Hadoop RPC using our benchmarks over different networks/protocols (1GigE/10GigE/IPoIB).

•  Will extend our benchmark suite to help users to make the performance comparisons among Hadoop Writable RPC, Avro, Thrift, and Protocol buffers

•  Will be made available to the big data community via an open-source release 24

WBDB 2013

Thank You!

{luxi, rahmanmd, islamn, panda}@cse.ohio-‐state.edu

Network-‐Based CompuAng Laboratory hhp://nowlab.cse.ohio-‐state.edu/

MVAPICH Web Page hhp://mvapich.cse.ohio-‐state.edu/

25 Hadoop-‐RDMA Web Page

hhp://hadoop-‐rdma.cse.ohio-‐state.edu/

a micro-benchmark suite for evaluating hadoop rpc on high-performance … · 2015-07-29 · • to...

Documents