big data meets hpc exploiting hpc technologies for ...€¦ · for accelerating big data processing...
TRANSCRIPT
Big Data Meets HPC – Exploiting HPC Technologies for Accelerating Big Data Processing
Dhabaleswar K. (DK) Panda
The Ohio State University
E-mail: [email protected]
http://www.cse.ohio-state.edu/~panda
Keynote Talk at HPCAC-Switzerland (Mar 2016)
by
HPCAC-Switzerland (Mar ‘16) 2Network Based Computing Laboratory
• Big Data has become the one of the most
important elements of business analytics
• Provides groundbreaking opportunities for
enterprise information management and
decision making
• The amount of data is exploding; companies
are capturing and digitizing more information
than ever
• The rate of information growth appears to be
exceeding Moore’s Law
Introduction to Big Data Applications and Analytics
HPCAC-Switzerland (Mar ‘16) 3Network Based Computing Laboratory
• Commonly accepted 3V’s of Big Data• Volume, Velocity, VarietyMichael Stonebraker: Big Data Means at Least Three Different Things, http://www.nist.gov/itl/ssd/is/upload/NIST-stonebraker.pdf
• 4/5V’s of Big Data – 3V + *Veracity, *Value
4V Characteristics of Big Data
Courtesy: http://api.ning.com/files/tRHkwQN7s-Xz5cxylXG004GLGJdjoPd6bVfVBwvgu*F5MwDDUCiHHdmBW-JTEz0cfJjGurJucBMTkIUNdL3jcZT8IPfNWfN9/dv1.jpg
HPCAC-Switzerland (Mar ‘16) 4Network Based Computing Laboratory
• Webpages (content, graph)
• Clicks (ad, page, social)
• Users (OpenID, FB Connect, etc.)
• e-mails (Hotmail, Y!Mail, Gmail, etc.)
• Photos, Movies (Flickr, YouTube, Video, etc.)
• Cookies / tracking info (see Ghostery)
• Installed apps (Android market, App Store, etc.)
• Location (Latitude, Loopt, Foursquared, Google Now, etc.)
• User generated content (Wikipedia & co, etc.)
• Ads (display, text, DoubleClick, Yahoo, etc.)
• Comments (Discuss, Facebook, etc.)
• Reviews (Yelp, Y!Local, etc.)
• Social connections (LinkedIn, Facebook, etc.)
• Purchase decisions (Netflix, Amazon, etc.)
• Instant Messages (YIM, Skype, Gtalk, etc.)
• Search terms (Google, Bing, etc.)
• News articles (BBC, NYTimes, Y!News, etc.)
• Blog posts (Tumblr, Wordpress, etc.)
• Microblogs (Twitter, Jaiku, Meme, etc.)
• Link sharing (Facebook, Delicious, Buzz, etc.)
Data Generation in Internet Services and Applications
Number of Apps in the Apple App Store, Android Market, Blackberry, and Windows Phone (2013)• Android Market: <1200K• Apple App Store: ~1000KCourtesy: http://dazeinfo.com/2014/07/10/apple-inc-aapl-ios-google-inc-goog-android-growth-mobile-ecosystem-2014/
HPCAC-Switzerland (Mar ‘16) 5Network Based Computing Laboratory
Velocity of Big Data – How Much Data Is Generated Every Minute on the Internet?
The global Internet population grew 18.5% from 2013 to 2015 and now represents
3.2 Billion People.Courtesy: https://www.domo.com/blog/2015/08/data-never-sleeps-3-0/
HPCAC-Switzerland (Mar ‘16) 6Network Based Computing Laboratory
• Scientific Data Management, Analysis, and Visualization
• Applications examples
– Climate modeling
– Combustion
– Fusion
– Astrophysics
– Bioinformatics
• Data Intensive Tasks
– Runs large-scale simulations on supercomputers
– Dump data on parallel storage systems
– Collect experimental / observational data
– Move experimental / observational data to analysis sites
– Visual analytics – help understand data visually
Not Only in Internet Services - Big Data in Scientific Domains
HPCAC-Switzerland (Mar ‘16) 7Network Based Computing Laboratory
• Hadoop: http://hadoop.apache.org
– The most popular framework for Big Data Analytics
– HDFS, MapReduce, HBase, RPC, Hive, Pig, ZooKeeper, Mahout, etc.
• Spark: http://spark-project.org
– Provides primitives for in-memory cluster computing; Jobs can load data into memory and query it repeatedly
• Storm: http://storm-project.net
– A distributed real-time computation system for real-time analytics, online machine learning, continuous
computation, etc.
• S4: http://incubator.apache.org/s4
– A distributed system for processing continuous unbounded streams of data
• GraphLab: http://graphlab.org
– Consists of a core C++ GraphLab API and a collection of high-performance machine learning and data mining
toolkits built on top of the GraphLab API.
• Web 2.0: RDBMS + Memcached (http://memcached.org)
– Memcached: A high-performance, distributed memory object caching systems
Typical Solutions or Architectures for Big Data Analytics
HPCAC-Switzerland (Mar ‘16) 8Network Based Computing Laboratory
Big Data Processing with Hadoop Components
• Major components included in this tutorial:
– MapReduce (Batch)
– HBase (Query)
– HDFS (Storage)
– RPC (Inter-process communication)
• Underlying Hadoop Distributed File System (HDFS) used by both MapReduce and HBase
• Model scales but high amount of communication during intermediate phases can be further optimized
HDFS
MapReduce
Hadoop Framework
User Applications
HBase
Hadoop Common (RPC)
HPCAC-Switzerland (Mar ‘16) 9Network Based Computing Laboratory
Worker
Worker
Worker
Worker
Master
HDFS
Driver
Zookeeper
Worker
SparkContext
• An in-memory data-processing framework
– Iterative machine learning jobs
– Interactive data analytics
– Scala based Implementation
– Standalone, YARN, Mesos
• Scalable and communication intensive
– Wide dependencies between Resilient Distributed Datasets (RDDs)
– MapReduce-like shuffle operations to repartition RDDs
– Sockets based communication
Spark Architecture Overview
http://spark.apache.org
HPCAC-Switzerland (Mar ‘16) 10Network Based Computing Laboratory
Memcached Architecture
• Distributed Caching Layer
– Allows to aggregate spare memory from multiple nodes
– General purpose
• Typically used to cache database queries, results of API calls
• Scalable model, but typical usage very network intensive
Main
memoryCPUs
SSD HDD
High Performance Networks
... ...
...
Main
memoryCPUs
SSD HDD
Main
memoryCPUs
SSD HDD
Main
memoryCPUs
SSD HDD
Main
memoryCPUs
SSD HDD
HPCAC-Switzerland (Mar ‘16) 11Network Based Computing Laboratory
• Substantial impact on designing and utilizing data management and processing systems in multiple tiers
– Front-end data accessing and serving (Online)
• Memcached + DB (e.g. MySQL), HBase
– Back-end data analytics (Offline)
• HDFS, MapReduce, Spark
Data Management and Processing on Modern Clusters
HPCAC-Switzerland (Mar ‘16) 12Network Based Computing Laboratory
• Introduced in Oct 2000• High Performance Data Transfer
– Interprocessor communication and I/O– Low latency (<1.0 microsec), High bandwidth (up to 12.5 GigaBytes/sec -> 100Gbps), and
low CPU utilization (5-10%)
• Multiple Operations– Send/Recv– RDMA Read/Write– Atomic Operations (very unique)
• high performance and scalable implementations of distributed locks, semaphores, collective communication operations
• Leading to big changes in designing – HPC clusters– File systems– Cloud computing systems– Grid computing systems
Open Standard InfiniBand Networking Technology
HPCAC-Switzerland (Mar ‘16) 13Network Based Computing Laboratory
Interconnects and Protocols in OpenFabrics Stack
Kernel
Space
Application / Middleware
Verbs
Ethernet
Adapter
EthernetSwitch
Ethernet Driver
TCP/IP
1/10/40/100 GigE
InfiniBand
Adapter
InfiniBand
Switch
IPoIB
IPoIB
Ethernet
Adapter
Ethernet Switch
Hardware Offload
TCP/IP
10/40 GigE-TOE
InfiniBand
Adapter
InfiniBand
Switch
User Space
RSockets
RSockets
iWARP Adapter
Ethernet Switch
TCP/IP
User Space
iWARP
RoCEAdapter
Ethernet Switch
RDMA
User Space
RoCE
InfiniBand
Switch
InfiniBand
Adapter
RDMA
User Space
IB Native
Sockets
Application / Middleware Interface
Protocol
Adapter
Switch
InfiniBand
Adapter
InfiniBand
Switch
RDMA
SDP
SDP
HPCAC-Switzerland (Mar ‘16) 14Network Based Computing Laboratory
How Can HPC Clusters with High-Performance Interconnect and Storage Architectures Benefit Big Data Applications?
Bring HPC and Big Data processing into a “convergent trajectory”!
What are the major
bottlenecks in current Big
Data processing
middleware (e.g. Hadoop,
Spark, and Memcached)?
Can the bottlenecks be alleviated with new
designs by taking advantage of HPC
technologies?
Can RDMA-enabled
high-performance
interconnects
benefit Big Data
processing?
Can HPC Clusters with
high-performance
storage systems (e.g.
SSD, parallel file
systems) benefit Big
Data applications?
How much
performance benefits
can be achieved
through enhanced
designs?
How to design
benchmarks for
evaluating the
performance of Big
Data middleware on
HPC clusters?
HPCAC-Switzerland (Mar ‘16) 15Network Based Computing Laboratory
Designing Communication and I/O Libraries for Big Data Systems: Challenges
Big Data Middleware(HDFS, MapReduce, HBase, Spark and Memcached)
Networking Technologies
(InfiniBand, 1/10/40/100 GigEand Intelligent NICs)
Storage Technologies(HDD, SSD, and NVMe-SSD)
Programming Models(Sockets)
Applications
Commodity Computing System Architectures
(Multi- and Many-core architectures and accelerators)
Other Protocols?
Communication and I/O Library
Point-to-PointCommunication
QoS
Threaded Modelsand Synchronization
Fault-ToleranceI/O and File Systems
Virtualization
Benchmarks
Upper level
Changes?
HPCAC-Switzerland (Mar ‘16) 16Network Based Computing Laboratory
• Sockets not designed for high-performance
– Stream semantics often mismatch for upper layers
– Zero-copy not available for non-blocking sockets
Can Big Data Processing Systems be Designed with High-Performance Networks and Protocols?
Current Design
Application
Sockets
1/10/40/100 GigENetwork
Our Approach
Application
OSU Design
10/40/100 GigE or InfiniBand
Verbs Interface
HPCAC-Switzerland (Mar ‘16) 17Network Based Computing Laboratory
• RDMA for Apache Spark
• RDMA for Apache Hadoop 2.x (RDMA-Hadoop-2.x)
– Plugins for Apache, Hortonworks (HDP) and Cloudera (CDH) Hadoop distributions
• RDMA for Apache Hadoop 1.x (RDMA-Hadoop)
• RDMA for Memcached (RDMA-Memcached)
• OSU HiBD-Benchmarks (OHB)
– HDFS and Memcached Micro-benchmarks
• http://hibd.cse.ohio-state.edu
• Users Base: 155 organizations from 20 countries
• More than 15,500 downloads from the project site
• RDMA for Apache HBase and Impala (upcoming)
The High-Performance Big Data (HiBD) Project
Available for InfiniBand and RoCE
HPCAC-Switzerland (Mar ‘16) 18Network Based Computing Laboratory
• High-Performance Design of Hadoop over RDMA-enabled Interconnects
– High performance RDMA-enhanced design with native InfiniBand and RoCE support at the verbs-level for HDFS, MapReduce, and
RPC components
– Enhanced HDFS with in-memory and heterogeneous storage
– High performance design of MapReduce over Lustre
– Plugin-based architecture supporting RDMA-based designs for Apache Hadoop, CDH and HDP
– Easily configurable for different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre) and different protocols (native
InfiniBand, RoCE, and IPoIB)
• Current release: 0.9.9
– Based on Apache Hadoop 2.7.1
– Compliant with Apache Hadoop 2.7.1, HDP 2.3.0.0 and CDH 5.6.0 APIs and applications
– Tested with
• Mellanox InfiniBand adapters (DDR, QDR and FDR)
• RoCE support with Mellanox adapters
• Various multi-core platforms
• Different file systems with disks and SSDs and Lustre
– http://hibd.cse.ohio-state.edu
RDMA for Apache Hadoop 2.x Distribution
HPCAC-Switzerland (Mar ‘16) 19Network Based Computing Laboratory
• High-Performance Design of Spark over RDMA-enabled Interconnects
– High performance RDMA-enhanced design with native InfiniBand and RoCE support at the verbs-level
for Spark
– RDMA-based data shuffle and SEDA-based shuffle architecture
– Non-blocking and chunk-based data transfer
– Easily configurable for different protocols (native InfiniBand, RoCE, and IPoIB)
• Current release: 0.9.1
– Based on Apache Spark 1.5.1
– Tested with
• Mellanox InfiniBand adapters (DDR, QDR and FDR)
• RoCE support with Mellanox adapters
• Various multi-core platforms
• RAM disks, SSDs, and HDD
– http://hibd.cse.ohio-state.edu
RDMA for Apache Spark Distribution
HPCAC-Switzerland (Mar ‘16) 20Network Based Computing Laboratory
• High-Performance Design of Memcached over RDMA-enabled Interconnects
– High performance RDMA-enhanced design with native InfiniBand and RoCE support at the verbs-level for
Memcached and libMemcached components
– High performance design of SSD-Assisted Hybrid Memory
– Easily configurable for native InfiniBand, RoCE and the traditional sockets-based support (Ethernet and
InfiniBand with IPoIB)
• Current release: 0.9.4
– Based on Memcached 1.4.24 and libMemcached 1.0.18
– Compliant with libMemcached APIs and applications
– Tested with
• Mellanox InfiniBand adapters (DDR, QDR and FDR)
• RoCE support with Mellanox adapters
• Various multi-core platforms
• SSD
– http://hibd.cse.ohio-state.edu
RDMA for Memcached Distribution
HPCAC-Switzerland (Mar ‘16) 21Network Based Computing Laboratory
• Micro-benchmarks for Hadoop Distributed File System (HDFS)
– Sequential Write Latency (SWL) Benchmark, Sequential Read Latency (SRL)
Benchmark, Random Read Latency (RRL) Benchmark, Sequential Write Throughput
(SWT) Benchmark, Sequential Read Throughput (SRT) Benchmark
– Support benchmarking of
• Apache Hadoop 1.x and 2.x HDFS, Hortonworks Data Platform (HDP) HDFS, Cloudera
Distribution of Hadoop (CDH) HDFS
• Micro-benchmarks for Memcached
– Get Benchmark, Set Benchmark, and Mixed Get/Set Benchmark
• Current release: 0.8
• http://hibd.cse.ohio-state.edu
OSU HiBD Micro-Benchmark (OHB) Suite – HDFS & Memcached
HPCAC-Switzerland (Mar ‘16) 22Network Based Computing Laboratory
• HHH: Heterogeneous storage devices with hybrid replication schemes are supported in this mode of operation to have better fault-tolerance as well
as performance. This mode is enabled by default in the package.
• HHH-M: A high-performance in-memory based setup has been introduced in this package that can be utilized to perform all I/O operations in-
memory and obtain as much performance benefit as possible.
• HHH-L: With parallel file systems integrated, HHH-L mode can take advantage of the Lustre available in the cluster.
• MapReduce over Lustre, with/without local disks: Besides, HDFS based solutions, this package also provides support to run MapReduce jobs on top
of Lustre alone. Here, two different modes are introduced: with local disks and without local disks.
• Running with Slurm and PBS: Supports deploying RDMA for Apache Hadoop 2.x with Slurm and PBS in different running modes (HHH, HHH-M, HHH-
L, and MapReduce over Lustre).
Different Modes of RDMA for Apache Hadoop 2.x
HPCAC-Switzerland (Mar ‘16) 23Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 24Network Based Computing Laboratory
• Enables high performance RDMA communication, while supporting traditional socket interface
• JNI Layer bridges Java based HDFS with communication library written in native code
Design Overview of HDFS with RDMA
HDFS
Verbs
RDMA Capable Networks(IB, iWARP, RoCE ..)
Applications
1/10/40/100 GigE, IPoIB Network
Java Socket Interface Java Native Interface (JNI)
WriteOthers
OSU Design
• Design Features
– RDMA-based HDFS write
– RDMA-based HDFS replication
– Parallel replication support
– On-demand connection setup
– InfiniBand/RoCE support
N. S. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy and D. K. Panda , High Performance RDMA-Based Design of HDFS
over InfiniBand , Supercomputing (SC), Nov 2012
N. Islam, X. Lu, W. Rahman, and D. K. Panda, SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-Enhanced HDFS, HPDC '14, June 2014
HPCAC-Switzerland (Mar ‘16) 25Network Based Computing Laboratory
Triple-H
Heterogeneous Storage
• Design Features
– Three modes
• Default (HHH)
• In-Memory (HHH-M)
• Lustre-Integrated (HHH-L)
– Policies to efficiently utilize the heterogeneous
storage devices
• RAM, SSD, HDD, Lustre
– Eviction/Promotion based on data usage
pattern
– Hybrid Replication
– Lustre-Integrated mode:
• Lustre-based fault-tolerance
Enhanced HDFS with In-Memory and Heterogeneous Storage
Hybrid Replication
Data Placement Policies
Eviction/Promotion
RAM Disk SSD HDD
Lustre
N. Islam, X. Lu, M. W. Rahman, D. Shankar, and D. K. Panda, Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters
with Heterogeneous Storage Architecture, CCGrid ’15, May 2015
Applications
HPCAC-Switzerland (Mar ‘16) 26Network Based Computing Laboratory
Design Overview of MapReduce with RDMA
MapReduce
Verbs
RDMA Capable Networks
(IB, iWARP, RoCE ..)
OSU Design
Applications
1/10/40/100 GigE, IPoIB Network
Java Socket Interface Java Native Interface (JNI)
Job
Tracker
Task
Tracker
Map
Reduce
• Enables high performance RDMA communication, while supporting traditional socket interface
• JNI Layer bridges Java based MapReduce with communication library written in native code
• Design Features
– RDMA-based shuffle
– Prefetching and caching map output
– Efficient Shuffle Algorithms
– In-memory merge
– On-demand Shuffle Adjustment
– Advanced overlapping
• map, shuffle, and merge
• shuffle, merge, and reduce
– On-demand connection setup
– InfiniBand/RoCE support
M. W. Rahman, X. Lu, N. S. Islam, and D. K. Panda, HOMR: A Hybrid Approach to Exploit Maximum Overlapping in
MapReduce over High Performance Interconnects, ICS, June 2014
HPCAC-Switzerland (Mar ‘16) 27Network Based Computing Laboratory
• A hybrid approach to achieve maximum
possible overlapping in MapReduce across
all phases compared to other approaches– Efficient Shuffle Algorithms
– Dynamic and Efficient Switching
– On-demand Shuffle Adjustment
Advanced Overlapping among Different Phases
Default Architecture
Enhanced Overlapping with In-Memory Merge
Advanced Hybrid Overlapping
M. W. Rahman, X. Lu, N. S. Islam, and D. K. Panda, HOMR: A
Hybrid Approach to Exploit Maximum Overlapping in MapReduce
over High Performance Interconnects, ICS, June 2014
HPCAC-Switzerland (Mar ‘16) 28Network Based Computing Laboratory
• Design Features
– JVM-bypassed buffer management
– RDMA or send/recv based adaptive communication
– Intelligent buffer allocation and adjustment for serialization
– On-demand connection setup
– InfiniBand/RoCE support
Design Overview of Hadoop RPC with RDMA
Hadoop RPC
Verbs
RDMA Capable Networks(IB, iWARP, RoCE ..)
Applications
1/10/40/100 GigE, IPoIB Network
Java Socket Interface Java Native Interface (JNI)
Our DesignDefault
OSU Design
• Enables high performance RDMA communication, while supporting traditional socket interface
• JNI Layer bridges Java based RPC with communication library written in native code
X. Lu, N. Islam, M. W. Rahman, J. Jose, H. Subramoni, H. Wang, and D. K. Panda, High-Performance Design of Hadoop RPC with
RDMA over InfiniBand, Int'l Conference on Parallel Processing (ICPP '13), October 2013.
HPCAC-Switzerland (Mar ‘16) 29Network Based Computing Laboratory
0
50
100
150
200
250
80 100 120
Exec
uti
on
Tim
e (s
)
Data Size (GB)
IPoIB (FDR)
0
50
100
150
200
250
80 100 120
Exec
uti
on
Tim
e (s
)
Data Size (GB)
IPoIB (FDR)
Performance Benefits – RandomWriter & TeraGen in TACC-Stampede
Cluster with 32 Nodes with a total of 128 maps
• RandomWriter
– 3-4x improvement over IPoIB
for 80-120 GB file size
• TeraGen
– 4-5x improvement over IPoIB
for 80-120 GB file size
RandomWriter TeraGen
Reduced by 3x Reduced by 4x
HPCAC-Switzerland (Mar ‘16) 30Network Based Computing Laboratory
0
100
200
300
400
500
600
700
800
900
80 100 120
Exec
uti
on
Tim
e (s
)
Data Size (GB)
IPoIB (FDR) OSU-IB (FDR)
0
100
200
300
400
500
600
80 100 120
Exec
uti
on
Tim
e (s
)
Data Size (GB)
IPoIB (FDR) OSU-IB (FDR)
Performance Benefits – Sort & TeraSort in TACC-Stampede
Cluster with 32 Nodes with a total of 128 maps and 64 reduces
• Sort with single HDD per node
– 40-52% improvement over IPoIB
for 80-120 GB data
• TeraSort with single HDD per node
– 42-44% improvement over IPoIB
for 80-120 GB data
Reduced by 52% Reduced by 44%
Cluster with 32 Nodes with a total of 128 maps and 57 reduces
HPCAC-Switzerland (Mar ‘16) 31Network Based Computing Laboratory
Evaluation of HHH and HHH-L with Applications
HDFS (FDR) HHH (FDR)
60.24 s 48.3 s
CloudBurstMR-MSPolyGraph
0
200
400
600
800
1000
4 6 8
Exec
uti
on
Tim
e (s
)
Concurrent maps per host
HDFS Lustre HHH-L Reduced by 79%
• MR-MSPolygraph on OSU RI with
1,000 maps
– HHH-L reduces the execution time
by 79% over Lustre, 30% over HDFS
• CloudBurst on TACC Stampede
– With HHH: 19% improvement over
HDFS
HPCAC-Switzerland (Mar ‘16) 32Network Based Computing Laboratory
Evaluation with Spark on SDSC Gordon (HHH vs. Tachyon/Alluxio)
• For 200GB TeraGen on 32 nodes
– Spark-TeraGen: HHH has 2.4x improvement over Tachyon; 2.3x over HDFS-IPoIB (QDR)
– Spark-TeraSort: HHH has 25.2% improvement over Tachyon; 17% over HDFS-IPoIB (QDR)
0
20
40
60
80
100
120
140
160
180
8:50 16:100 32:200
Exe
cuti
on
Tim
e (
s)
Cluster Size : Data Size (GB)
IPoIB (QDR) Tachyon OSU-IB (QDR)
0
100
200
300
400
500
600
700
8:50 16:100 32:200
Exe
cuti
on
Tim
e (
s)
Cluster Size : Data Size (GB)
Reduced by 2.4x
Reduced by 25.2%
TeraGen TeraSort
N. Islam, M. W. Rahman, X. Lu, D. Shankar, and D. K. Panda, Performance Characterization and Acceleration of In-Memory File
Systems for Hadoop and Spark Applications on HPC Clusters, IEEE BigData ’15, October 2015
HPCAC-Switzerland (Mar ‘16) 33Network Based Computing Laboratory
Intermediate Data Directory
Design Overview of Shuffle Strategies for MapReduce over Lustre
• Design Features
– Two shuffle approaches
• Lustre read based shuffle
• RDMA based shuffle
– Hybrid shuffle algorithm to take benefit from both shuffle approaches
– Dynamically adapts to the better shuffle approach for each shuffle request based on profiling values for each Lustre read operation
– In-memory merge and overlapping of different phases are kept similar to RDMA-enhanced MapReduce design
Map 1 Map 2 Map 3
Lustre
Reduce 1 Reduce 2
Lustre Read / RDMA
In-memory
merge/sortreduce
M. W. Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda, High Performance Design of YARN
MapReduce on Modern HPC Clusters with Lustre and RDMA, IPDPS, May 2015
In-memory
merge/sortreduce
HPCAC-Switzerland (Mar ‘16) 34Network Based Computing Laboratory
• For 500GB Sort in 64 nodes
– 44% improvement over IPoIB (FDR)
Performance Improvement of MapReduce over Lustre on TACC-Stampede
• For 640GB Sort in 128 nodes
– 48% improvement over IPoIB (FDR)
0
200
400
600
800
1000
1200
300 400 500
Job
Exe
cuti
on
Tim
e (s
ec)
Data Size (GB)
IPoIB (FDR)OSU-IB (FDR)
0
50
100
150
200
250
300
350
400
450
500
20 GB 40 GB 80 GB 160 GB 320 GB 640 GB
Cluster: 4 Cluster: 8 Cluster: 16 Cluster: 32 Cluster: 64 Cluster: 128
Job
Exe
cuti
on
Tim
e (s
ec)
IPoIB (FDR) OSU-IB (FDR)
M. W. Rahman, X. Lu, N. S. Islam, R. Rajachandrasekar, and D. K. Panda, MapReduce over Lustre: Can RDMA-
based Approach Benefit?, Euro-Par, August 2014.
• Local disk is used as the intermediate data directoryReduced by 48%Reduced by 44%
HPCAC-Switzerland (Mar ‘16) 35Network Based Computing Laboratory
• For 80GB Sort in 8 nodes
– 34% improvement over IPoIB (QDR)
Case Study - Performance Improvement of MapReduce over Lustre on SDSC-Gordon
• For 120GB TeraSort in 16 nodes
– 25% improvement over IPoIB (QDR)
• Lustre is used as the intermediate data directory
0
100
200
300
400
500
600
700
800
900
40 60 80
Job
Exe
cuti
on
Tim
e (
sec)
Data Size (GB)
IPoIB (QDR)
OSU-Lustre-Read (QDR)
OSU-RDMA-IB (QDR)
OSU-Hybrid-IB (QDR)
0
100
200
300
400
500
600
700
800
900
40 80 120
Job
Exe
cuti
on
Tim
e (s
ec)
Data Size (GB)
Reduced by 25%Reduced by 34%
HPCAC-Switzerland (Mar ‘16) 36Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 37Network Based Computing Laboratory
HBase-RDMA Design Overview
• JNI Layer bridges Java based HBase with communication library written in native code
• Enables high performance RDMA communication, while supporting traditional socket interface
HBase
IB Verbs
RDMA Capable Networks
(IB, iWARP, RoCE ..)
OSU-IB Design
Applications
1/10/40/100 GigE, IPoIB Networks
Java Socket Interface Java Native Interface (JNI)
HPCAC-Switzerland (Mar ‘16) 38Network Based Computing Laboratory
HBase – YCSB Read-Write Workload
• HBase Get latency (QDR, 10GigE)
– 64 clients: 2.0 ms; 128 Clients: 3.5 ms
– 42% improvement over IPoIB for 128 clients
• HBase Put latency (QDR, 10GigE)
– 64 clients: 1.9 ms; 128 Clients: 3.5 ms
– 40% improvement over IPoIB for 128 clients
0
1000
2000
3000
4000
5000
6000
7000
8 16 32 64 96 128
Tim
e (
us)
No. of Clients
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
8 16 32 64 96 128
Tim
e (
us)
No. of Clients
10GigE
Read Latency Write Latency
OSU-IB (QDR)IPoIB (QDR)
J. Huang, X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, Chet Murthy, and D. K. Panda, High-Performance Design of HBase with RDMA over InfiniBand, IPDPS’12
HPCAC-Switzerland (Mar ‘16) 39Network Based Computing Laboratory
HBase – YCSB Get Latency and Throughput on SDSC-Comet
• HBase Get average latency (FDR)
– 4 client threads: 38 us
– 59% improvement over IPoIB for 4 client threads
• HBase Get total throughput
– 4 client threads: 102 Kops/sec
– 2.4x improvement over IPoIB for 4 client threads
Get Latency Get Throughput
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4Ave
rage
Lat
ency
(m
s)
Number of Client Threads
IPoIB (FDR) OSU-IB (FDR)
0
20
40
60
80
100
120
1 2 3 4
Tota
l Th
rou
ghp
ut
(Ko
ps/
sec)
Number of Client Threads
IPoIB (FDR) OSU-IB (FDR)59%
2.4x
HPCAC-Switzerland (Mar ‘16) 40Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 41Network Based Computing Laboratory
• Design Features
– RDMA based shuffle
– SEDA-based plugins
– Dynamic connection management and sharing
– Non-blocking data transfer
– Off-JVM-heap buffer management
– InfiniBand/RoCE support
Design Overview of Spark with RDMA
• Enables high performance RDMA communication, while supporting traditional socket interface
• JNI Layer bridges Scala based Spark with communication library written in native code
X. Lu, M. W. Rahman, N. Islam, D. Shankar, and D. K. Panda, Accelerating Spark with RDMA for Big Data Processing: Early
Experiences, Int'l Symposium on High Performance Interconnects (HotI'14), August 2014
HPCAC-Switzerland (Mar ‘16) 42Network Based Computing Laboratory
• InfiniBand FDR, SSD, 64 Worker Nodes, 1536 Cores, (1536M 1536R)
• RDMA-based design for Spark 1.5.1
• RDMA vs. IPoIB with 1536 concurrent tasks, single SSD per node.
– SortBy: Total time reduced by up to 80% over IPoIB (56Gbps)
– GroupBy: Total time reduced by up to 57% over IPoIB (56Gbps)
Performance Evaluation on SDSC Comet – SortBy/GroupBy
64 Worker Nodes, 1536 cores, SortByTest Total Time 64 Worker Nodes, 1536 cores, GroupByTest Total Time
0
50
100
150
200
250
300
64 128 256
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
0
50
100
150
200
250
64 128 256
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
57%80%
HPCAC-Switzerland (Mar ‘16) 43Network Based Computing Laboratory
• InfiniBand FDR, SSD, 64 Worker Nodes, 1536 Cores, (1536M 1536R)
• RDMA-based design for Spark 1.5.1
• RDMA vs. IPoIB with 1536 concurrent tasks, single SSD per node.
– Sort: Total time reduced by 38% over IPoIB (56Gbps)
– TeraSort: Total time reduced by 15% over IPoIB (56Gbps)
Performance Evaluation on SDSC Comet – HiBench Sort/TeraSort
64 Worker Nodes, 1536 cores, Sort Total Time 64 Worker Nodes, 1536 cores, TeraSort Total Time
0
50
100
150
200
250
300
350
400
450
64 128 256
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
0
100
200
300
400
500
600
64 128 256
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
15%38%
HPCAC-Switzerland (Mar ‘16) 44Network Based Computing Laboratory
• InfiniBand FDR, SSD, 32/64 Worker Nodes, 768/1536 Cores, (768/1536M 768/1536R)
• RDMA-based design for Spark 1.5.1
• RDMA vs. IPoIB with 768/1536 concurrent tasks, single SSD per node.
– 32 nodes/768 cores: Total time reduced by 37% over IPoIB (56Gbps)
– 64 nodes/1536 cores: Total time reduced by 43% over IPoIB (56Gbps)
Performance Evaluation on SDSC Comet – HiBench PageRank
32 Worker Nodes, 768 cores, PageRank Total Time 64 Worker Nodes, 1536 cores, PageRank Total Time
0
50
100
150
200
250
300
350
400
450
Huge BigData Gigantic
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
0
100
200
300
400
500
600
700
800
Huge BigData Gigantic
Tim
e (s
ec)
Data Size (GB)
IPoIB
RDMA
43%37%
HPCAC-Switzerland (Mar ‘16) 45Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 46Network Based Computing Laboratory
• Server and client perform a negotiation protocol
– Master thread assigns clients to appropriate worker thread
• Once a client is assigned a verbs worker thread, it can communicate directly and is “bound” to that thread
• All other Memcached data structures are shared among RDMA and Sockets worker threads
• Memcached Server can serve both socket and verbs clients simultaneously
• Memcached applications need not be modified; uses verbs interface if available
Memcached-RDMA Design
SocketsClient
RDMAClient
MasterThread
SocketsWorker Thread
VerbsWorker Thread
SocketsWorker Thread
VerbsWorker Thread
SharedData
MemorySlabsItems
…
1
1
2
2
HPCAC-Switzerland (Mar ‘16) 47Network Based Computing Laboratory
1
10
100
1000
1 2 4 8
16
32
64
12
8
25
6
51
2
1K
2K
4K
Tim
e (
us)
Message Size
OSU-IB (FDR)IPoIB (FDR)
0
100
200
300
400
500
600
700
16 32 64 128 256 512 1024 2048 4080Tho
usa
nd
s o
f Tr
ansa
ctio
ns
pe
r Se
con
d (
TPS)
No. of Clients
• Memcached Get latency
– 4 bytes OSU-IB: 2.84 us; IPoIB: 75.53 us
– 2K bytes OSU-IB: 4.49 us; IPoIB: 123.42 us
• Memcached Throughput (4bytes)
– 4080 clients OSU-IB: 556 Kops/sec, IPoIB: 233 Kops/s
– Nearly 2X improvement in throughput
Memcached GET Latency Memcached Throughput
RDMA-Memcached Performance (FDR Interconnect)
Experiments on TACC Stampede (Intel SandyBridge Cluster, IB: FDR)
Latency Reduced by nearly 20X
2X
HPCAC-Switzerland (Mar ‘16) 48Network Based Computing Laboratory
• Illustration with Read-Cache-Read access pattern using modified mysqlslap load testing
tool
• Memcached-RDMA can
- improve query latency by up to 66% over IPoIB (32Gbps)
- throughput by up to 69% over IPoIB (32Gbps)
Micro-benchmark Evaluation for OLDP workloads
012345678
64 96 128 160 320 400
Late
ncy
(se
c)
No. of Clients
Memcached-IPoIB (32Gbps)
Memcached-RDMA (32Gbps)
0
1000
2000
3000
4000
64 96 128 160 320 400
Thro
ugh
pu
t (
Kq
/s)
No. of Clients
Memcached-IPoIB (32Gbps)
Memcached-RDMA (32Gbps)
D. Shankar, X. Lu, J. Jose, M. W. Rahman, N. Islam, and D. K. Panda, Can RDMA Benefit On-Line Data Processing Workloads
with Memcached and MySQL, ISPASS’15
Reduced by 66%
HPCAC-Switzerland (Mar ‘16) 49Network Based Computing Laboratory
• ohb_memlat & ohb_memthr latency & throughput micro-benchmarks
• Memcached-RDMA can
- improve query latency by up to 70% over IPoIB (32Gbps)
- improve throughput by up to 2X over IPoIB (32Gbps)
- No overhead in using hybrid mode when all data can fit in memory
Performance Benefits of Hybrid Memcached (Memory + SSD) on SDSC-Gordon
0
2
4
6
8
10
64 128 256 512 1024
Thro
ugh
pu
t (m
illio
n t
ran
s/se
c)
No. of Clients
IPoIB (32Gbps)
RDMA-Mem (32Gbps)
RDMA-Hybrid (32Gbps)
0
100
200
300
400
500
Ave
rage
late
ncy
(u
s)
Message Size (Bytes)
2X
HPCAC-Switzerland (Mar ‘16) 50Network Based Computing Laboratory
– Memcached latency test with Zipf distribution, server with 1 GB memory, 32 KB key-value pair size, total
size of data accessed is 1 GB (when data fits in memory) and 1.5 GB (when data does not fit in memory)
– When data fits in memory: RDMA-Mem/Hybrid gives 5x improvement over IPoIB-Mem
– When data does not fit in memory: RDMA-Hybrid gives 2x-2.5x over IPoIB/RDMA-Mem
Performance Evaluation on IB FDR + SATA/NVMe SSDs
0
500
1000
1500
2000
2500
Set Get Set Get Set Get Set Get Set Get Set Get Set Get Set Get
IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe
IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe
Data Fits In Memory Data Does Not Fit In Memory
Late
ncy
(u
s)
slab allocation (SSD write) cache check+load (SSD read) cache update server response client wait miss-penalty
HPCAC-Switzerland (Mar ‘16) 51Network Based Computing Laboratory
– RDMA-Accelerated Communication for Memcached Get/Set
– Hybrid ‘RAM+SSD’ slab management for higher data retention
– Non-blocking API extensions • memcached_(iset/iget/bset/bget/test/wait)
• Achieve near in-memory speeds while hiding bottlenecks of network and SSD I/O
• Ability to exploit communication/computation overlap
• Optional buffer re-use guarantees
– Adaptive slab manager with different I/O schemes for higher throughput.
Accelerating Hybrid Memcached with RDMA, Non-blocking Extensions and SSDs
D. Shankar, X. Lu, N. S. Islam, M. W. Rahman, and D. K. Panda, High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits, IPDPS, May 2016
BLOCKING API
NON-BLOCKING API REQ.
NON-BLOCKING API REPLY
CLI
ENT
SER
VER
HYBRID SLAB MANAGER (RAM+SSD)
RDMA-ENHANCED COMMUNICATION LIBRARY
RDMA-ENHANCED COMMUNICATION LIBRARY
LIBMEMCACHED LIBRARY
Blocking API Flow Non-Blocking API Flow
HPCAC-Switzerland (Mar ‘16) 52Network Based Computing Laboratory
– Data does not fit in memory: Non-blocking Memcached Set/Get API Extensions can achieve• >16x latency improvement vs. blocking API over RDMA-Hybrid/RDMA-Mem w/ penalty• >2.5x throughput improvement vs. blocking API over default/optimized RDMA-Hybrid
– Data fits in memory: Non-blocking Extensions perform similar to RDMA-Mem/RDMA-Hybrid and >3.6x improvement over IPoIB-Mem
Performance Evaluation with Non-Blocking Memcached API
0
500
1000
1500
2000
2500
Set Get Set Get Set Get Set Get Set Get Set Get
IPoIB-Mem RDMA-Mem H-RDMA-Def H-RDMA-Opt-Block H-RDMA-Opt-NonB-i
H-RDMA-Opt-NonB-b
Ave
rage
Lat
en
cy (
us)
Miss Penalty (Backend DB Access Overhead)
Client Wait
Server Response
Cache Update
Cache check+Load (Memory and/or SSD read)
Slab Allocation (w/ SSD write on Out-of-Mem)
H = Hybrid Memcached over SATA SSD Opt = Adaptive slab manager Block = Default Blocking API NonB-i = Non-blocking iset/iget API NonB-b = Non-blocking bset/bget API w/ buffer re-use guarantee
HPCAC-Switzerland (Mar ‘16) 53Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 54Network Based Computing Laboratory
• Design Features
– Memcached-based burst-buffer
system
• Hides latency of parallel file
system access
• Read from local storage and
Memcached
– Data locality achieved by writing data
to local storage
– Different approaches of integration
with parallel file system to guarantee
fault-tolerance
Accelerating I/O Performance of Big Data Analytics through RDMA-based Key-Value Store
Application
I/O Forwarding Module
Map/Reduce Task DataNode
Local Disk
Data LocalityFault-tolerance
Lustre
Memcached-based Burst Buffer System
HPCAC-Switzerland (Mar ‘16) 55Network Based Computing Laboratory
Evaluation with PUMA Workloads
Gains on OSU RI with our approach (Mem-bb) on 24 nodes
• SequenceCount: 34.5% over Lustre, 40% over HDFS
• RankedInvertedIndex: 27.3% over Lustre, 48.3% over HDFS
• HistogramRating: 17% over Lustre, 7% over HDFS
0
500
1000
1500
2000
2500
3000
3500
4000
4500
SeqCount RankedInvIndex HistoRating
Exe
cuti
on
Tim
e (
s)
Workloads
HDFS (32Gbps)
Lustre (32Gbps)
Mem-bb (32Gbps)
48.3%
40%
17%
N. S. Islam, D. Shankar, X. Lu, M.
W. Rahman, and D. K. Panda,
Accelerating I/O Performance of
Big Data Analytics with RDMA-
based Key-Value Store, ICPP
’15, September 2015
HPCAC-Switzerland (Mar ‘16) 56Network Based Computing Laboratory
• RDMA-based Designs and Performance Evaluation
– HDFS
– MapReduce
– RPC
– HBase
– Spark
– Memcached (Basic, Hybrid and Non-blocking APIs)
– HDFS + Memcached-based Burst Buffer
– OSU HiBD Benchmarks (OHB)
Acceleration Case Studies and Performance Evaluation
HPCAC-Switzerland (Mar ‘16) 57Network Based Computing Laboratory
• The current benchmarks provide some performance behavior
• However, do not provide any information to the designer/developer on:
– What is happening at the lower-layer?
– Where the benefits are coming from?
– Which design is leading to benefits or bottlenecks?
– Which component in the design needs to be changed and what will be its impact?
– Can performance gain/loss at the lower-layer be correlated to the performance
gain/loss observed at the upper layer?
Are the Current Benchmarks Sufficient for Big Data?
HPCAC-Switzerland (Mar ‘16) 58Network Based Computing Laboratory
Big Data Middleware(HDFS, MapReduce, HBase, Spark and Memcached)
Networking Technologies
(InfiniBand, 1/10/40/100 GigEand Intelligent NICs)
Storage Technologies(HDD, SSD, and NVMe-SSD)
Programming Models(Sockets)
Applications
Commodity Computing System Architectures
(Multi- and Many-core architectures and accelerators)
Other Protocols?
Communication and I/O Library
Point-to-PointCommunication
QoS
Threaded Modelsand Synchronization
Fault-ToleranceI/O and File Systems
Virtualization
Benchmarks
RDMA Protocols
Challenges in Benchmarking of RDMA-based Designs
Current
Benchmarks
No Benchmarks
Correlation?
HPCAC-Switzerland (Mar ‘16) 59Network Based Computing Laboratory
• A comprehensive suite of benchmarks to
– Compare performance of different MPI libraries on various networks and systems
– Validate low-level functionalities
– Provide insights to the underlying MPI-level designs
• Started with basic send-recv (MPI-1) micro-benchmarks for latency, bandwidth and bi-directional bandwidth
• Extended later to
– MPI-2 one-sided
– Collectives
– GPU-aware data movement
– OpenSHMEM (point-to-point and collectives)
– UPC
• Has become an industry standard
• Extensively used for design/development of MPI libraries, performance comparison of MPI libraries and even in procurement of large-scale systems
• Available from http://mvapich.cse.ohio-state.edu/benchmarks
• Available in an integrated manner with MVAPICH2 stack
OSU MPI Micro-Benchmarks (OMB) Suite
HPCAC-Switzerland (Mar ‘16) 60Network Based Computing Laboratory
Big Data Middleware(HDFS, MapReduce, HBase, Spark and Memcached)
Networking Technologies
(InfiniBand, 1/10/40/100 GigEand Intelligent NICs)
Storage Technologies(HDD, SSD, and NVMe-SSD)
Programming Models(Sockets)
Applications
Commodity Computing System Architectures
(Multi- and Many-core architectures and accelerators)
Other Protocols?
Communication and I/O Library
Point-to-PointCommunication
QoS
Threaded Modelsand Synchronization
Fault-ToleranceI/O and File Systems
Virtualization
Benchmarks
RDMA Protocols
Iterative Process – Requires Deeper Investigation and Design for Benchmarking Next Generation Big Data Systems and Applications
Applications-Level
Benchmarks
Micro-
Benchmarks
HPCAC-Switzerland (Mar ‘16) 61Network Based Computing Laboratory
• HDFS Benchmarks
– Sequential Write Latency (SWL) Benchmark
– Sequential Read Latency (SRL) Benchmark
– Random Read Latency (RRL) Benchmark
– Sequential Write Throughput (SWT) Benchmark
– Sequential Read Throughput (SRT) Benchmark
• Memcached Benchmarks
– Get Benchmark
– Set Benchmark
– Mixed Get/Set Benchmark
• Available as a part of OHB 0.8
OSU HiBD Benchmarks (OHB)N. S. Islam, X. Lu, M. W. Rahman, J. Jose, and D.
K. Panda, A Micro-benchmark Suite for
Evaluating HDFS Operations on Modern
Clusters, Int'l Workshop on Big Data
Benchmarking (WBDB '12), December 2012
D. Shankar, X. Lu, M. W. Rahman, N. Islam, and
D. K. Panda, A Micro-Benchmark Suite for
Evaluating Hadoop MapReduce on High-
Performance Networks, BPOE-5 (2014)
X. Lu, M. W. Rahman, N. Islam, and D. K. Panda,
A Micro-Benchmark Suite for Evaluating Hadoop
RPC on High-Performance Networks, Int'l
Workshop on Big Data Benchmarking (WBDB
'13), July 2013
To be Released
HPCAC-Switzerland (Mar ‘16) 62Network Based Computing Laboratory
• Upcoming Releases of RDMA-enhanced Packages will support
– HBase
– Impala
• Upcoming Releases of OSU HiBD Micro-Benchmarks (OHB) will support
– MapReduce
– RPC
• Advanced designs with upper-level changes and optimizations
– Memcached with Non-blocking API
– HDFS + Memcached-based Burst Buffer
On-going and Future Plans of OSU High Performance Big Data (HiBD) Project
HPCAC-Switzerland (Mar ‘16) 63Network Based Computing Laboratory
• Discussed challenges in accelerating Hadoop, Spark and Memcached
• Presented initial designs to take advantage of InfiniBand/RDMA for HDFS,
MapReduce, RPC, Spark, and Memcached
• Results are promising
• Many other open issues need to be solved
• Will enable Big Data community to take advantage of modern HPC
technologies to carry out their analytics in a fast and scalable manner
Concluding Remarks
HPCAC-Switzerland (Mar ‘16) 64Network Based Computing Laboratory
Funding Acknowledgments
Funding Support by
Equipment Support by
HPCAC-Switzerland (Mar ‘16) 65Network Based Computing Laboratory
Personnel AcknowledgmentsCurrent Students
– A. Augustine (M.S.)
– A. Awan (Ph.D.)
– S. Chakraborthy (Ph.D.)
– C.-H. Chu (Ph.D.)
– N. Islam (Ph.D.)
– M. Li (Ph.D.)
Past Students
– P. Balaji (Ph.D.)
– S. Bhagvat (M.S.)
– A. Bhat (M.S.)
– D. Buntinas (Ph.D.)
– L. Chai (Ph.D.)
– B. Chandrasekharan (M.S.)
– N. Dandapanthula (M.S.)
– V. Dhanraj (M.S.)
– T. Gangadharappa (M.S.)
– K. Gopalakrishnan (M.S.)
– G. Santhanaraman (Ph.D.)
– A. Singh (Ph.D.)
– J. Sridhar (M.S.)
– S. Sur (Ph.D.)
– H. Subramoni (Ph.D.)
– K. Vaidyanathan (Ph.D.)
– A. Vishnu (Ph.D.)
– J. Wu (Ph.D.)
– W. Yu (Ph.D.)
Past Research Scientist
– S. Sur
Current Post-Doc
– J. Lin
– D. Banerjee
Current Programmer
– J. Perkins
Past Post-Docs
– H. Wang
– X. Besseron
– H.-W. Jin
– M. Luo
– W. Huang (Ph.D.)
– W. Jiang (M.S.)
– J. Jose (Ph.D.)
– S. Kini (M.S.)
– M. Koop (Ph.D.)
– R. Kumar (M.S.)
– S. Krishnamoorthy (M.S.)
– K. Kandalla (Ph.D.)
– P. Lai (M.S.)
– J. Liu (Ph.D.)
– M. Luo (Ph.D.)
– A. Mamidala (Ph.D.)
– G. Marsh (M.S.)
– V. Meshram (M.S.)
– A. Moody (M.S.)
– S. Naravula (Ph.D.)
– R. Noronha (Ph.D.)
– X. Ouyang (Ph.D.)
– S. Pai (M.S.)
– S. Potluri (Ph.D.)
– R. Rajachandrasekar (Ph.D.)
– K. Kulkarni (M.S.)
– M. Rahman (Ph.D.)
– D. Shankar (Ph.D.)
– A. Venkatesh (Ph.D.)
– J. Zhang (Ph.D.)
– E. Mancini
– S. Marcarelli
– J. Vienne
Current Research Scientists Current Senior Research Associate
– H. Subramoni
– X. Lu
Past Programmers
– D. Bureddy
- K. Hamidouche
Current Research Specialist
– M. Arnold
HPCAC-Switzerland (Mar ‘16) 66Network Based Computing Laboratory
Second International Workshop on High-Performance Big Data Computing (HPBDC)
HPBDC 2016 will be held with IEEE International Parallel and Distributed Processing
Symposium (IPDPS 2016), Chicago, Illinois USA, May 27th, 2016Keynote Talk: Dr. Chaitanya Baru,
Senior Advisor for Data Science, National Science Foundation (NSF); Distinguished Scientist, San Diego Supercomputer Center (SDSC)
Panel Moderator: Jianfeng Zhan (ICT/CAS)Panel Topic: Merge or Split: Mutual Influence between Big Data and HPC Techniques
Six Regular Research Papers and Two Short Research Papers
http://web.cse.ohio-state.edu/~luxi/hpbdc2016
HPBDC 2015 was held in conjunction with ICDCS’15
http://web.cse.ohio-state.edu/~luxi/hpbdc2015
HPCAC-Switzerland (Mar ‘16) 67Network Based Computing Laboratory
Thank You!
The High-Performance Big Data Projecthttp://hibd.cse.ohio-state.edu/
Network-Based Computing Laboratoryhttp://nowlab.cse.ohio-state.edu/
The MVAPICH2 Projecthttp://mvapich.cse.ohio-state.edu/