high performance mpi on ibm 12x infiniband architecture...2 presentation road-map • introduction...
TRANSCRIPT
![Page 1: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/1.jpg)
1
High Performance MPI on IBM 12x InfiniBand Architecture
Abhinav Vishnu, Brad Benton1
and
Dhabaleswar
K. Panda
vishnu, panda @ [email protected]
![Page 2: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/2.jpg)
2
Presentation Road-Map
•
Introduction and Motivation•
Background
•
Enhanced MPI design for IBM 12x Architecture
•
Performance Evaluation•
Conclusions and Future Work
![Page 3: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/3.jpg)
3
Introduction and Motivation
•
Demand for more compute power is driven by Parallel Applications–
Molecular Dynamics (NAMD), Car Crash Simulations (LS-
DYNA) , ...... , ……
•
Cluster sizes have been increasing forever to meet these demands–
9K proc. (Sandia Thunderbird, ASCI Q)–
Larger scale clusters are planned using upcoming multi-
core architectures
•
MPI is used as the primary programming model for writing these applications
![Page 4: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/4.jpg)
4
Emergence of InfiniBand
•
Interconnects with very low latency and very high throughput have become available–
InfiniBand, Myrinet, Quadrics …•
InfiniBand –
High Performance and Open Standard–
Advanced Features•
PCI-Express Based InfiniBand Adapters are becoming popular–
8X (1X ~ 2.5 Gbps) with Double Data Rate (DDR) support–
MPI Designs for these Adapters are emerging•
Compared to PCI-Express, GX+ I/O Bus Based Adapters
are also emerging–
4X and 12X
link support
![Page 5: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/5.jpg)
5
InfiniBand AdaptersTo Network
PCI-X (4x Bidirectional)
HCAChipsetHCA
ChipsetP1P1
I/O Bus InterfaceI/O Bus Interface
P2P2
4x
4x
PCI-Express (16x Bidirectional)GX+ (>24x Bidirectional Bandwidth)
12x
12x
To Host
(SDR/DDR)
MPI for PCI-Express based are coming upIBM 12x InfiniBand Adapters on GX+ are coming up
![Page 6: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/6.jpg)
6
Problem Statement
•
How do we design an MPI with low overhead
for IBM 12x InfiniBand Architecture?
•
What are the performance benefits of enhanced design over the existing designs?–
Point-to-point communication–
Collective communication–
MPI Applications
![Page 7: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/7.jpg)
7
Presentation Road-Map
•
Introduction and Motivation•
Background
•
Enhanced MPI design for IBM 12x Architecture
•
Performance Evaluation•
Conclusions and Future Work
![Page 8: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/8.jpg)
8
Overview of InfiniBand
•
An interconnect technology to connect I/O nodes and processing nodes
•
InfiniBand provides multiple transport semantics–
Reliable Connection•
Supports reliable notification and Remote Direct Memory Access (RDMA)
–
Unreliable Datagram•
Data delivery is not reliable, send/recv
is supported–
Reliable Datagram•
Currently not implemented by Vendors–
Unreliable Connection•
Notification is not supported•
InfiniBand uses a queue pair (QP) model for data transfer–
Send queue (for send operations)–
Receive queue (not involved in RDMA kind of operations)
![Page 9: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/9.jpg)
9
MultiPathing
Configurations
SwitchSwitchA combination of these is also possible
Multiple Adapters and Multiple Ports
(Multi-Rail Configurations)
Multi-rail for multipleSend/recv
engines
![Page 10: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/10.jpg)
10
Presentation Road-Map
•
Introduction and Motivation•
Background
•
Enhanced MPI design for IBM 12x Architecture
•
Performance Evaluation•
Conclusions and Future Work
![Page 11: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/11.jpg)
11
MPI Design for 12x Architecture
InfiniBand Layer
ADI Layer
CommunicationScheduler
SchedulingPolicies
CompletionNotifier
Communication Marker
Notification
EPC
Multiple QPs/port
Jiuxing Liu, Abhinav Vishnu and Dhabaleswar K. Panda. , “Building Multi-rail InfiniBand Clusters:
MPI-level Design and Performance Evaluation, ”. SuperComputing 2004
Eager, Rendezvouspt-to-pt,collective?
![Page 12: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/12.jpg)
12
Discussion on Scheduling Policies
Policies
Reverse Multiplexing Even Striping
Binding Round Robin
Enhanced Pt-to-Pt and Collective (EPC)
Overhead•Multiple Stripes•Multiple Completions
Non-blockingBlocking
CommunicationCollective
Communication
![Page 13: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/13.jpg)
13
EPC Characteristics
•
For small messages, round robin
policy is used –
Striping leads to overhead for small messages
pt-2-pt blocking striping
non-blocking round-robin
collective striping
![Page 14: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/14.jpg)
14
MVAPICH/MVAPICH2
•
We have used MVAPICH
as our MPI framework for the enhanced design
•
MVAPICH/MVAPICH2–
High Performance MPI-1/MPI-2 implementation over InfiniBand and iWARP
–
Has powered many supercomputers in TOP500 supercomputing rankings
–
Currently being used by more than 450 organizations (academia and industry worldwide)
–
http://nowlab.cse.ohio-state.edu/projects/mpi-iba•
The enhanced design is available with MVAPICH–
Will become available with MVAPICH2 in the upcoming releases
![Page 15: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/15.jpg)
15
Presentation Road-Map
•
Introduction and Motivation•
Background
•
Enhanced MPI design for IBM 12x Architecture
•
Performance Evaluation•
Conclusions and Future Work
![Page 16: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/16.jpg)
16
Experimental TestBed
•
The Experimental Test-Bed consists of:–
Power5 based systems with SLES9 SP2–
GX+ at 950 MHz clock speed–
2.6.9 Kernel Version–
2.8 GHz Processor with 8 GB of Memory–
TS120 switch for connecting the adapters•
One port per adapter and one adapter is used for communication–
The objective is to see the benefit with using only one physical port
![Page 17: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/17.jpg)
17
Ping-Pong Latency Test
•
EPC adds insignificant overhead
to the small message latency•
Large Message latency reduces by 41% using EPC
with IBM 12x architecture
![Page 18: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/18.jpg)
18
Small Messages Throughput
•
Unidirectional bandwidth doubles for small messages
using EPC
•
Bidirectional bandwidth does not improve with increasing number of QPs
due to the copy bandwidth limitation
![Page 19: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/19.jpg)
19
Large Messages Throughput
•
EPC improves the uni-directional and bi-directional throughput significantly for medium size messages
•
We can achieve a peak unidirectional bandwidth of 2731 MB/s
and bidirectional bandwidth of 5421 MB/s
![Page 20: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/20.jpg)
20
Collective Communication
•
MPI_Alltoall shows significant benefits for large messages•
MPI_Bcast
shows more benefits for very large messages
![Page 21: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/21.jpg)
21
NAS Parallel Benchmarks
•
For class A and class B problem sizes, x1 configuration shows improvement
•
There is no degradation for other configurations on Fourier Transform
![Page 22: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/22.jpg)
22
NAS Parallel Benchmarks
•
Integer sort shows 7-11%
improvement for x1 configurations•
Other NAS Parallel Benchmarks do not show performance degradation
![Page 23: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/23.jpg)
23
Presentation Road-Map
•
Introduction and Motivation•
Background
•
Enhanced MPI design for IBM 12x Architecture
•
Performance Evaluation•
Conclusions and Future Work
![Page 24: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/24.jpg)
24
Conclusions
•
We presented an enhanced design for IBM 12x InfiniBand Architecture–
EPC (Enhanced Point-to-Point and collective communication)
•
We have implemented our design and evaluated with Micro-
benchmarks, collectives and MPI application kernels
•
IBM 12x HCAs
can significantly improve communication performance–
41% for ping-pong latency test–
63-65% for uni-directional and bi-directional bandwidth tests
–
7-13% improvement in performance for NAS Parallel Benchmarks
–
We can achieve a peak bandwidth of 2731 MB/s
and 5421 MB/s
unidirectional and bidirectional bandwidth respectively
![Page 25: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/25.jpg)
25
Future Directions
•
We plan to evaluate EPC with multi-rail configurations on upcoming multi-core systems–
Multi-port configurations
–
Multi-HCA configurations•
Scalability studies of using multiple QPs
on large
scale clusters–
Impact of QP caching
–
Network Fault Tolerance
![Page 26: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/26.jpg)
26
Acknowledgements
Our research is supported by the following organizations
• Current Funding support by
• Current Equipment support by
![Page 27: High Performance MPI on IBM 12x InfiniBand Architecture...2 Presentation Road-Map • Introduction and Motivation • Background • Enhanced MPI design for IBM 12x Architecture •](https://reader033.vdocument.in/reader033/viewer/2022060216/5f0611e17e708231d4162392/html5/thumbnails/27.jpg)
27
Web Pointers
http://nowlab.cse.ohio-state.edu/
MVAPICH Web Pagehttp://mvapich.cse.ohio-state.edu
E-mail: vishnu, [email protected],[email protected]