interconnect your future hpc... · x86, power, gpu, arm and fpga-based compute and storage...
TRANSCRIPT
Paving the Road to Exascale
February 2016
Interconnect Your Future
© 2016 Mellanox Technologies 2- Mellanox Confidential -
The Ever Growing Demand for Higher Performance
2000 202020102005
“Roadrunner”
1st
2015
Terascale Petascale Exascale
Single-Core to Many-CoreSMP to Clusters
Performance Development
Co-Design
HW SW
APP
Hardware
Software
Application
The Interconnect is the Enabling Technology
© 2016 Mellanox Technologies 3- Mellanox Confidential -
Co-Design Architecture to Enable Exascale Performance
CPU-Centric Co-Design
Limited to Main CPU Usage
Results in Performance Limitation
Creating Synergies
Enables Higher Performance and Scale
Software
In-CPU
ComputingIn-Network
Computing
In-Storage
Computing
© 2016 Mellanox Technologies 4- Mellanox Confidential -
The Intelligence is Moving to the Interconnect
CPU
Interconnect
Past Future
© 2016 Mellanox Technologies 5- Mellanox Confidential -
Intelligent Interconnect Delivers Higher Datacenter ROI
Users
Users
Intelligence
Network Offloads
Computing for applications
Smart Network
Increase Datacenter Value
Network functions
On CPU
© 2016 Mellanox Technologies 6- Mellanox Confidential -
Breaking the Application Latency Wall
Today: Network device latencies are on the order of 100 nanoseconds
Challenge: Enabling the next order of magnitude improvement in application performance
Solution: Creating synergies between software and hardware – intelligent interconnect
Intelligent Interconnect Paves the Road to Exascale Performance
10 years ago
~10
microsecond
~100
microsecond
NetworkCommunication
Framework
Today
~10
microsecond
Communication
Framework
~0.1
microsecond
Network
~1
microsecond
Communication
Framework
Future
~0.05
microsecond
Co-Design
Network
© 2016 Mellanox Technologies 7- Mellanox Confidential -
Introducing Switch-IB 2 World’s First Smart Switch
© 2016 Mellanox Technologies 8- Mellanox Confidential -
Introducing Switch-IB 2 World’s First Smart Switch
The world fastest switch with <90 nanosecond latency
36-ports, 100Gb/s per port, 7.2Tb/s throughput, 7.02 Billion messages/sec
Adaptive Routing, Congestion control, support for multiple topologies
World’s First Smart Switch
Build for Scalable Compute and Storage Infrastructures
10X Higher Performance with The New Switch SHArP Technology
© 2016 Mellanox Technologies 9- Mellanox Confidential -
SHArP (Scalable Hierarchical Aggregation Protocol) Technology
Delivering 10X Performance Improvement
for MPI and SHMEM/PAGS Communications
Switch-IB 2 Enables the Switch Network to
Operate as a Co-Processor
SHArP Enables Switch-IB 2 to Manage and
Execute MPI Operations in the Network
© 2016 Mellanox Technologies 10- Mellanox Confidential -
SHArP Performance Advantage
MiniFE is a Finite Element mini-application
• Implements kernels that represent
implicit finite-element applications
10X to 25X Performance Improvement
AllRedcue MPI Collective
© 2016 Mellanox Technologies 11- Mellanox Confidential -
Multi-Host Socket DirectTM – Low Latency Socket Communication
Each CPU with direct network access
QPI avoidance for I/O – improve performance
Enables GPU / peer direct on both sockets
Solution is transparent to software
CPU CPUCPU CPUQPI
Multi-Host Socket Direct Performance
50% Lower CPU Utilization
20% lower Latency
Multi Host Evaluation Kit
Lower Application Latency, Free-up CPU
© 2016 Mellanox Technologies 12- Mellanox Confidential -
Introducing ConnectX-4 Lx Programmable Adapter
Scalable, Efficient, High-Performance and Flexible Solution
Security
Cloud/Virtualization
Storage
High Performance Computing
Precision Time Synchronization
Networking + FPGA
Mellanox Acceleration Engines
and FGPA Programmability
On One Adapter
© 2016 Mellanox Technologies 13- Mellanox Confidential -
Mellanox InfiniBand Proven and Most Scalable HPC Interconnect
“Summit” System “Sierra” System
Paving the Road to Exascale
© 2016 Mellanox Technologies 14- Mellanox Confidential -
20K InfiniBand nodes
Mellanox end-to-end scalable EDR, FDR and QDR InfiniBand
Supports variety of scientific and engineering projects
• Coupled atmosphere-ocean models
• Future space vehicle design
• Large-scale dark matter halos and galaxy evolution
Leveraging InfiniBand backward and future compatibility
System Example: NASA Ames Research Center Pleiades
High-Resolution Climate Simulations
© 2016 Mellanox Technologies 15- Mellanox Confidential -
High-Performance Designed 100Gb/s Interconnect Solutions
Transceivers
Active Optical and Copper Cables
(10 / 25 / 40 / 50 / 56 / 100Gb/s)VCSELs, Silicon Photonics and Copper
36 EDR (100Gb/s) Ports, <90ns Latency
Throughput of 7.2Tb/s
7.02 Billion msg/sec (195M msg/sec/port)
100Gb/s Adapter, 0.7us latency
150 million messages per second
(10 / 25 / 40 / 50 / 56 / 100Gb/s)
32 100GbE Ports, 64 25/50GbE Ports
(10 / 25 / 40 / 50 / 100GbE)
Throughput of 6.4Tb/s
© 2016 Mellanox Technologies 16- Mellanox Confidential -
Leading Supplier of End-to-End Interconnect Solutions
Software and ServicesICs Switches/GatewaysAdapter Cards Cables/ModulesMetro / WAN
StoreAnalyzeEnabling the Use of Data
At the Speeds of 10, 25, 40, 50, 56 and 100 Gigabit per Second
Comprehensive End-to-End InfiniBand and Ethernet Portfolio
© 2016 Mellanox Technologies 17- Mellanox Confidential -
The Performance Power of EDR 100G InfiniBand
28%
28%-63% Increase in Overall
System Performance
© 2016 Mellanox Technologies 18- Mellanox Confidential -
End-to-End Interconnect Solutions for All Platforms
Highest Performance and Scalability for
X86, Power, GPU, ARM and FPGA-based Compute and Storage Platforms
10, 20, 25, 40, 50, 56 and 100Gb/s Speeds
X86Open
POWERGPU ARM FPGA
Smart Interconnect to Unleash The Power of All Compute Architectures
© 2016 Mellanox Technologies 19- Mellanox Confidential -
Technology Roadmap – One-Generation Lead over the Competition
2000 202020102005
20Gbs 40Gbs 56Gbs 100Gbs
“Roadrunner”Mellanox Connected
1st3rd
TOP500 2003Virginia Tech (Apple)
2015
200Gbs
Terascale Petascale Exascale
Mellanox
© 2016 Mellanox Technologies 20- Mellanox Confidential -
Maximize Performance via Accelerator and GPU Offloads
GPUDirect RDMA Technology
© 2016 Mellanox Technologies 21- Mellanox Confidential -
GPUs are Everywhere!
GPUDirect RDMA / Sync
CPU
GPUChip
set
GPUMemory
System
Memory
1
GPU
© 2016 Mellanox Technologies 22- Mellanox Confidential -
Eliminates CPU bandwidth and latency bottlenecks
Uses remote direct memory access (RDMA) transfers between GPUs
Resulting in significantly improved MPI efficiency between GPUs in remote nodes
Based on PCIe PeerDirect technology
GPUDirect™ RDMA (GPUDirect 3.0)
With GPUDirect™ RDMA
Using PeerDirect™
© 2016 Mellanox Technologies 23- Mellanox Confidential -
GPU-GPU Internode MPI Latency
Low
er is
Bette
r
Performance of MPI with GPUDirect RDMA
88% Lower Latency
GPU-GPU Internode MPI Bandwidth
Hig
her
is B
ett
er
10X Increase in Throughput
Source: Prof. DK Panda
9.3X
2.18 usec
10x
© 2016 Mellanox Technologies 24- Mellanox Confidential -
Mellanox GPUDirect RDMA Performance Advantage
HOOMD-blue is a general-purpose Molecular Dynamics simulation code accelerated on GPUs
GPUDirect RDMA allows direct peer to peer GPU communications over InfiniBand• Unlocks performance between GPU and InfiniBand
• This provides a significant decrease in GPU-GPU communication latency
• Provides complete CPU offload from all GPU communications across the network
102%
2X Application
Performance!
© 2016 Mellanox Technologies 25- Mellanox Confidential -
GPUDirect Sync (GPUDirect 4.0)
GPUDirect RDMA (3.0) – direct data path between the GPU and Mellanox interconnect
• Control path still uses the CPU
- CPU prepares and queues communication tasks on GPU
- GPU triggers communication on HCA
- Mellanox HCA directly accesses GPU memory
GPUDirect Sync (GPUDirect 4.0)
• Both data path and control path go directly
between the GPU and the Mellanox interconnect
0
10
20
30
40
50
60
70
80
2 4
Ave
rag
e t
ime
pe
r it
era
tio
n (
us
)
Number of nodes/GPUs
2D stencil benchmark
RDMA only RDMA+PeerSync
27% faster 23% faster
Maximum Performance
For GPU Clusters
© 2016 Mellanox Technologies 26- Mellanox Confidential -
Remote GPU Access through rCUDA
GPU servers GPU as a Service
rCUDA daemon
Network InterfaceCUDA
Driver + runtimeNetwork Interface
rCUDA library
Application
Client Side Server Side
Application
CUDA
Driver + runtime
CUDA Application
rCUDA provides remote access from
every node to any GPU in the system
CPUVGPU
CPUVGPU
CPUVGPU
GPUGPUGPUGPUGPUGPUGPUGPUGPUGPUGPU
© 2016 Mellanox Technologies 27- Mellanox Confidential -
All HPC Clouds to Use Mellanox Solutions
HPC Clouds
© 2016 Mellanox Technologies 28- Mellanox Confidential -
Virtualization for HPC: Mellanox SRIOV
© 2016 Mellanox Technologies 29- Mellanox Confidential -
HPC Clouds – Performance Demands Mellanox Solutions
San Diego Supercomputing Center “Comet” System (2015) to
Leverage Mellanox InfiniBand to Build HPC Cloud
Thank You