interconnect your future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/sc18/sc18 - mellanox...
TRANSCRIPT
![Page 1: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/1.jpg)
1© 2018 Mellanox Technologies | Confidential
Paving the Road to ExascaleNovember 2018
Interconnect Your Future
![Page 2: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/2.jpg)
2© 2018 Mellanox Technologies | Confidential
Highest-Performance 200Gb/s Interconnect Solutions
TransceiversActive Optical and Copper Cables(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)
40 HDR (200Gb/s) InfiniBand Ports80 HDR100 InfiniBand PortsThroughput of 16Tb/s, <90ns Latency
200Gb/s Adapter, 0.6us latency215 million messages per second(10 / 25 / 40 / 50 / 56 / 100 / 200Gb/s)
16 400GbE, 32 200GbE, 128 25/50GbE Ports(10 / 25 / 40 / 50 / 100 / 200 GbE)Throughput of 6.4Tb/s
MPI, SHMEM/PGAS, UPCFor Commercial and Open Source ApplicationsLeverages Hardware Accelerations
System on Chip and SmartNICProgrammable adapterSmart Offloads
![Page 3: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/3.jpg)
3© 2018 Mellanox Technologies | Confidential
The Need for Intelligent and Faster Interconnect
CPU-Centric (Onload) Data-Centric (Offload)
Must Wait for the DataCreates Performance Bottlenecks
Faster Data Speeds and In-Network Computing Enable Higher Performance and Scale
GPU
CPU
GPU
CPU
Onload Network In-Network Computing
GPU
CPU
CPU
GPU
GPU
CPU
GPU
CPU
GPU
CPU
CPU
GPU
Analyze Data as it Moves!Higher Performance and Scale
![Page 4: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/4.jpg)
4© 2018 Mellanox Technologies | Confidential
Data Centric Architecture to Overcome Latency Bottlenecks
CPU-Centric (Onload) Data-Centric (Offload)
Communications Latencies of 30-40us
Intelligent Interconnect Paves the Road to Exascale Performance
GPU
CPU
GPU
CPU
GPU
CPU
CPU
GPU
GPU
CPU
GPU
CPU
GPU
CPU
CPU
GPU
Communications Latenciesof 3-4us
![Page 5: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/5.jpg)
5© 2018 Mellanox Technologies | Confidential
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)
Reliable Scalable General Purpose Primitive In-network Tree based aggregation mechanism Large number of groups Multiple simultaneous outstanding operations
Applicable to Multiple Use-cases HPC Applications using MPI / SHMEM Distributed Machine Learning applications
Scalable High Performance Collective Offload Barrier, Reduce, All-Reduce, Broadcast and more Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND Integer and Floating-Point, 16/32/64 bits
SHArP Tree
SHARP Tree Aggregation Node
(Process running on HCA)
SHARP Tree Endnode
(Process running on HCA)
SHARP Tree Root
![Page 6: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/6.jpg)
6© 2018 Mellanox Technologies | Confidential
SHARP AllReduce Performance Advantages (128 Nodes)
SHARP enables 75% Reduction in Latency
Providing Scalable Flat Latency
![Page 7: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/7.jpg)
7© 2018 Mellanox Technologies | Confidential
SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology
SHARP Enables Highest Performance
![Page 8: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/8.jpg)
8© 2018 Mellanox Technologies | Confidential
SHARP Performance – Application (OSU)
Network-Based Computing Laboratoryhttp://nowlab.cse.ohio-state.edu/
The MVAPICH2 Projecthttp://mvapich.cse.ohio-state.edu/
Source: Prof. DK Panda, Ohio State University
![Page 9: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/9.jpg)
9© 2018 Mellanox Technologies | Confidential
Performs the Gradient AveragingReplaces all physical parameter serversAccelerate AI Performance
SHARP Accelerates AI Performance
The CPU in a parameter server becomes the bottleneck
![Page 10: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/10.jpg)
10© 2018 Mellanox Technologies | Confidential
SHARP Performance Advantage for AI
SHARP provides 16% Performance Increase for deep learning, initial results TensorFlow with Horovod running ResNet50 benchmark, HDR InfiniBand (ConnectX-6, Quantum)
16%
![Page 11: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/11.jpg)
11© 2018 Mellanox Technologies | Confidential
SHIELD - Self Healing Interconnect Technology
The ability to overcome network failures, locally, by the switches
Software-based solutions suffer from long delays detecting network failures 5-30 seconds for 1K to 10K nodes clusters
Accelerates network recovery time by 5000X
The higher the speed or scale the greater the recovery value
Available with EDR and HDR switches and beyond
Enables Unbreakable Data Centers
![Page 12: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/12.jpg)
12© 2018 Mellanox Technologies | Confidential
SHIELD: Consider a Flow From A to B
Data
Server A Server B
![Page 13: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/13.jpg)
13© 2018 Mellanox Technologies | Confidential
SHIELD: The Simple Case: Local Fix
Server A Server B
Data
![Page 14: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/14.jpg)
14© 2018 Mellanox Technologies | Confidential
SHIELD: The Remote Case - Using Fault Recovery Notifications
Server A Server B
Data
FRN
Data
![Page 15: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/15.jpg)
15© 2018 Mellanox Technologies | Confidential
Network Topologies
![Page 16: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/16.jpg)
16© 2018 Mellanox Technologies | Confidential
Supporting Variety of Topologies
Torus DragonflyFat Tree Hypercube
![Page 17: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/17.jpg)
17© 2018 Mellanox Technologies | Confidential
Traditional Dragonfly vs Dragonfly+
Dragonfly+s
3
1
2 l1
s
3
1
2 l1
s
3
1
2 l1
s
3
1
2 l1
![Page 18: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/18.jpg)
18© 2018 Mellanox Technologies | Confidential
HCA
x 20
1 2 20
HCA
x 20
HCA
x 20
3.1 3.2 3.20
HCA
x 20
1 2 20
HCA
x 20
HCA
x 20
2.1 2.2 2.20
Dragonfly+ Topology
Several “groups”, connected using all to all links
The topology inside each group can be any topology
Reduce total cost of network (fewer long cables)
Utilizes Adaptive Routing to for efficient operations
Simplifies future system expansion
Full-Graph connecting
every group to all
other groups
Group 1
1 2 H
Group 2
H+1 H+2 2H
Group G
GH
BB
B
B
L
1200-Nodes Dragonfly+ Systems Example
HCA
x 20
1 2 20
HCA
x 20
HCA
x 20
1.1 1.2 1.20
G1 G2 G3
![Page 19: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/19.jpg)
19© 2018 Mellanox Technologies | Confidential
Dragonfly+ Topology
Several “groups”, connected using all to all links
The topology inside each group can be any topology
Reduce total cost of network (fewer long cables)
Utilizes Adaptive Routing to for efficient operations
Simplifies future system expansion
Full-Graph connecting
every group to all
other groups
Group 1
1 2 H
Group 2
H+1 H+2 2H
Group G
GH
BB
B
B
L
1.1
2.1
3.1
1.2
2.23
.2
1.2
0
2.20
3.2
0
1200-Nodes Dragonfly+ Systems Example
HCA
x 20
1 2 20
HCA
x 20
HCA
x 20
HCA
x 20
1 2 20
HCA
x 20
HCA
x 20HCA
x 20
1 2 20
HCA
x 20
HCA
x 20
10
![Page 20: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/20.jpg)
20© 2018 Mellanox Technologies | Confidential
1 112
1
20HCA
x 20
2 20
20
1
20HCA
x 20
2 20
20
1
20HCA
x 20
2 20
20
Future Expansion of Dragonfly+ Based System
Topology expansion of a Fat Tree, or a regular/Aries like Dragonfly requires one of the following Reduction of early phase bisection bandwidth due to reservation of ports on the network switches Re-cabling the long cables
Dragonfly+ is the only topology that allows system expansion at zero cost While maintaining bisection bandwidth No port reservation No re-cabling
1.2
0
2.20
21
.201
.2
2.2
21.2
1.1
2.1
21.1
Phase 1:
11x400 =
4400 hosts
![Page 21: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/21.jpg)
21© 2018 Mellanox Technologies | Confidential
1 112
1
20HCA
x 20
2 20
20
1
20HCA
x 20
2 20
20
1
20HCA
x 20
2 20
20
Future Expansion of Dragonfly+ Based System
1.1
1.2
0
1.2
2.12.202.2
21.1
21
.20
21.2
1221
1
20HCA
x 20
220
20
1
20HCA
x 20
220
20
21.1 12.121.20 12.2021.2 12.2
Re-cable the central racks,
a change local to the RACK
Phase 1:
11x400 =
4400 hosts
Phase 2:
+10x400 =
8400 hosts
![Page 22: Interconnect Your Future - files.gpfsug.orgfiles.gpfsug.org/presentations/2018/SC18/SC18 - Mellanox - HPC Advantages.pdfFuture Expansion of Dragonfly+ Based System Topology expansion](https://reader034.vdocument.in/reader034/viewer/2022043009/5f9a1fa532d2300d8d4516eb/html5/thumbnails/22.jpg)
22© 2018 Mellanox Technologies | Confidential
Thank You