dynamically allocating gpgpu to host nodes (servers) - gtc...
TRANSCRIPT
Dynamically Allocating GPUs to Host Nodes (Servers)
Saeed Iqbal, Shawn Gao and Alaa Yousif
Dell HPC
Introduction
Dell HPC
How can we use GPUs in Servers to make solutions ?
Dell HPC
How can we use GPUs in Servers ? There are two fundamental options External GPUs
Internal GPUs
Dell HPC
How can we use GPUs in Servers ? There are two fundamental options External GPUs
– Number of GPUs Flexible
– Sharing GPUs among users
– Easy to replace/service GPUs
– Targeted toward large number of GPU installations.
Internal GPUs
– Number of GPUs Fixed
– Less GPU related Cabling
– Each GPU has fixed BW to CPUs
– Targeted towards small and large GPU installations.
Dell HPC
Overview of the Solution Components:- C410X Basically its, “Room and board” for 16 GPUs Features: Theoretical Max. of 16.5 TFLOPs Connects up to 8 hosts Connects up to 16 PCIe Gen-2 devices (GPUs) to hosts Connects a Maximum of 8 devices to a given host High density, 3U chassis Flexibility of selecting number of GPUs Individually serviceable modules N+1 1400W Power supplies (3+1) N+1 92mm Cooling fans (7+1) PCIe switches (8 PEX 8647, 4 PEX 8696)
Dell HPC
Overview of the Solution Components - C6220 Features High density – Four Compute Nodes in 2U space
Each Node: Dual Intel Sandy Bridge-EP (E5-2600) processor,
16 DIMMs up to 256GB per node
Internal Storage 24TB SATA, 36TB SAS
1 PCIe Gen3 x8 Mezzanine (daughter card)
FDR IB or QDR IB or 10GigE
1 PCIe x16 Gen3 (half-length, half-height)
Embedded BMC with IPMI 2.0 support
Chassis Design: Hot Plug, Individual Nodes
Up to 12 x 3.5” drives (3 per node), 24 x 2.5” drives (6 per node)
N+1 Power supplies (1100W or 1400W)
Dell HPC
Host to GPU Mapping Options on the C410X- Connect to 2,4 or 8 GPUs per Host
The three mapping options available on the C410X
Dell HPC
How to change the mapping ? Use Web User Interface
Connect to the C410X using laptop
Has to be done individually on each C410X
Easy for small installations
Use the Command line (CLI)
Connect to the C410X through and use IPMITool
Can be scripted for automation, job scheduler/workload manager
Can handle multiple C410X through the attached compute nodes
Targeted towards small and large installations
Dell HPC
Details of the Web User Interface
Dell HPC
Dynamic Mapping
Dell HPC
Baseboard Management Controller (BMC) Industry Standard Support for IPMI v2.0
Out-of-band monitoring and control severs
Helps in generating FRU information report ,which includes main board part number, product name, manufacturer and so on.)
Health status/Hardware monitoring report.
View and clear events log
Event notification through Platform Event Trap (PET)
Platform Event Filtering (PEF) to take selected action for selected events.
BMC
Dell HPC
IPMITool Utility for managing and configuring devices Open Standard for monitoring, logging, recovery and HW control.
Independent of CPU, BIOS and OS
IPMItool is a simple CLI to the remote BMC using IPMI (v1.5/2.0) Protocol IPMPI (Intelligent Platform Management Interface)
Read/Print the sensor data repository (SDR) values
Display the System Event Log (SEL)
Print Field replaceable Unit (FRU) inventory information
Read/Set LAN configuration parameters
Remote chassis power control
Ipmitool is part of the ipmiutil package which is part of the RHEL distribution
Available from http://ipmitool.sourceforge.net/ Version 1.8.11
By Duncan Laurie <[email protected]>
Dell HPC
The “port_map.sh” Script from Dell.com # ./port_map.sh <bmc_ip> <bmc_un> <bmc_pw>
# ./port_map.sh 198.168.12.146 <username> <password>
The current iPass port map to PCIE port will be listed. For example If iPass1 port is configured as a 1:4:-
iPass1 <==> PCIE1 PCIE2 PCIE15 PCIE16
iPass5 <==> None
Change? (n/j/1/2/3/4/5/6/7/8):
To configure iPass1 port as a 1:2 enter "1".
iPass1 <==> PCIE1 PCIE15
iPass5 <==> PCIE2 PCIE16
The PCIE port assignment for iPass1 & iPass5 will updated accordingly.
Dell HPC
Putting it together- C410X+BMC+IPMITool
BMC
1. Get current mapping
2. Change mapping to new
3. Reboot the C410X
4. Wait until C410X is up
1. Calculated the new mappings
for all compute nodes
2. Send new mapping to compute
nodes
3. Wait until C410X has new
mapping
4. Reboot the compute node
IPASS Cables
Gigabit Ethernet Fabric
Master Node
Compute Nodes
Scripts Using IPMITool
Dell HPC
Demo 1 of 2
Dell HPC
Putting it together- C410X+BMC+IPMItool
Master Node
Compute Nodes
BMC
BMC
“Sandwich
Configurations”
Several configurations
are Possible !!
Dell HPC
Putting it together- C410X+BMC+IPMItool
Master Node
Compute Nodes
64 GPU/32 Node
Configuration
Dell HPC
Possible Combinations There are 25 possible ways 16
GPUs can be mapped to 8 servers in the C410X.
These range from all servers getting 2 GPUs each to two servers getting 8 GPUs each.
S3 S7 S4 S8 S1 S5 S2 S6
8 0 0 0 8 0 0 0
4 0 4 0 8 0 0 0
2 2 4 0 8 0 0 0
4 0 2 2 8 0 0 0
2 2 2 2 8 0 0 0
8 0 0 0 4 0 4 0
4 0 4 0 4 0 4 0
2 2 4 0 4 0 4 0
4 0 2 2 4 0 4 0
2 2 2 2 4 0 4 0
8 0 0 0 2 2 4 0
4 0 4 0 2 2 4 0
2 2 4 0 2 2 4 0
4 0 2 2 2 2 4 0
2 2 2 2 2 2 4 0
8 0 0 0 4 0 2 2
4 0 4 0 4 0 2 2
2 2 4 0 4 0 2 2
4 0 2 2 4 0 2 2
2 2 2 2 4 0 2 2
8 0 0 0 2 2 2 2
4 0 4 0 2 2 2 2
2 2 4 0 2 2 2 2
4 0 2 2 2 2 2 2
2 2 2 2 2 2 2 2
Dell HPC
Use Cases
Dell HPC
Use Cases 1: HPC Data Centers “The number of GPUs a given application requires is different”
– A large number of users submit parallel jobs
– The jobs have a requested number of GPUs/node
– The Job scheduler takes the requests into account to schedule
– The Job scheduler tries to find nodes with the correct number of GPUs
– If such nodes are unavailable it will trigger a dynamic allocation
Job Scheduler
Dell HPC
Use Case 2: HPC Cloud Providers (PaaS)
“The nodes are provisioned with the correct number of GPUs at each instant”
– Users request specific platforms features (number of GPUs, time)
– Provision nodes with required number of GPUs, transfer control to the user
– At the end the GPUs are detached and shared with other nodes
Workload Manager
1. 4 nodes 4 GPU/node for 8 hours
2. 8 nodes 2 GPU/node for 2 hours
3. 4 nodes 8 GPU/node for 6 hours
4. 16 nodes 2 GPU/node for 16 hours
5. 8 nodes 2 GPU/node for 8 hours
6. 32 nodes 4 GPU/node for 12 hours
7. 64 nodes 2 GPU/node for 24 hours
Dell HPC
Demo 2 of 2
Dell HPC
Questions
Dell HPC
Reference
25
IPMI
http://www.intel.com/design/servers/ipmi/ani/index.htm
C410X BMC
http://support.dell.com/support/edocs/SYSTEMS/cp_pe_c410x/en/BMC/BMC.pdf
• Script from Dell.com/support
http://www.dell.com/support/drivers/us/en/19/DriverDetails/DriverFileFormats?c=us&s=dhs&cs=19&l=en&DriverId=R302138