dynamically allocating gpgpu to host nodes (servers) - gtc...

Dynamically Allocating GPUs to Host Nodes (Servers)

Saeed Iqbal, Shawn Gao and Alaa Yousif

Dell HPC

Introduction

Dell HPC

How can we use GPUs in Servers to make solutions ?

Dell HPC

How can we use GPUs in Servers ? There are two fundamental options External GPUs

Internal GPUs

Dell HPC

How can we use GPUs in Servers ? There are two fundamental options External GPUs

– Number of GPUs Flexible

– Sharing GPUs among users

– Easy to replace/service GPUs

– Targeted toward large number of GPU installations.

Internal GPUs

– Number of GPUs Fixed

– Less GPU related Cabling

– Each GPU has fixed BW to CPUs

– Targeted towards small and large GPU installations.

Dell HPC

Overview of the Solution Components:- C410X Basically its, “Room and board” for 16 GPUs Features: Theoretical Max. of 16.5 TFLOPs Connects up to 8 hosts Connects up to 16 PCIe Gen-2 devices (GPUs) to hosts Connects a Maximum of 8 devices to a given host High density, 3U chassis Flexibility of selecting number of GPUs Individually serviceable modules N+1 1400W Power supplies (3+1) N+1 92mm Cooling fans (7+1) PCIe switches (8 PEX 8647, 4 PEX 8696)

Dell HPC

Overview of the Solution Components - C6220 Features High density – Four Compute Nodes in 2U space

Each Node: Dual Intel Sandy Bridge-EP (E5-2600) processor,

16 DIMMs up to 256GB per node

Internal Storage 24TB SATA, 36TB SAS

1 PCIe Gen3 x8 Mezzanine (daughter card)

FDR IB or QDR IB or 10GigE

1 PCIe x16 Gen3 (half-length, half-height)

Embedded BMC with IPMI 2.0 support

Chassis Design: Hot Plug, Individual Nodes

Up to 12 x 3.5” drives (3 per node), 24 x 2.5” drives (6 per node)

N+1 Power supplies (1100W or 1400W)

Dell HPC

Host to GPU Mapping Options on the C410X- Connect to 2,4 or 8 GPUs per Host

The three mapping options available on the C410X

Dell HPC

How to change the mapping ? Use Web User Interface

Connect to the C410X using laptop

Has to be done individually on each C410X

Easy for small installations

Use the Command line (CLI)

Connect to the C410X through and use IPMITool

Can be scripted for automation, job scheduler/workload manager

Can handle multiple C410X through the attached compute nodes

Targeted towards small and large installations

Dell HPC

Details of the Web User Interface

Dell HPC

Dynamic Mapping

Dell HPC

Baseboard Management Controller (BMC) Industry Standard Support for IPMI v2.0

Out-of-band monitoring and control severs

Helps in generating FRU information report ,which includes main board part number, product name, manufacturer and so on.)

Health status/Hardware monitoring report.

View and clear events log

Event notification through Platform Event Trap (PET)

Platform Event Filtering (PEF) to take selected action for selected events.

BMC

Dell HPC

IPMITool Utility for managing and configuring devices Open Standard for monitoring, logging, recovery and HW control.

Independent of CPU, BIOS and OS

IPMItool is a simple CLI to the remote BMC using IPMI (v1.5/2.0) Protocol IPMPI (Intelligent Platform Management Interface)

Read/Print the sensor data repository (SDR) values

Display the System Event Log (SEL)

Print Field replaceable Unit (FRU) inventory information

Read/Set LAN configuration parameters

Remote chassis power control

Ipmitool is part of the ipmiutil package which is part of the RHEL distribution

Available from http://ipmitool.sourceforge.net/ Version 1.8.11

By Duncan Laurie <[email protected]>

http://ipmitool.sourceforge.net/

mailto:[email protected]

Dell HPC

The “port_map.sh” Script from Dell.com # ./port_map.sh <bmc_ip> <bmc_un> <bmc_pw>

# ./port_map.sh 198.168.12.146 <username> <password>

The current iPass port map to PCIE port will be listed. For example If iPass1 port is configured as a 1:4:-

iPass1 <==> PCIE1 PCIE2 PCIE15 PCIE16

iPass5 <==> None

Change? (n/j/1/2/3/4/5/6/7/8):

To configure iPass1 port as a 1:2 enter "1".

iPass1 <==> PCIE1 PCIE15

iPass5 <==> PCIE2 PCIE16

The PCIE port assignment for iPass1 & iPass5 will updated accordingly.

Dell HPC

Putting it together- C410X+BMC+IPMITool

BMC

1. Get current mapping

2. Change mapping to new

3. Reboot the C410X

4. Wait until C410X is up

1. Calculated the new mappings

for all compute nodes

2. Send new mapping to compute

nodes

3. Wait until C410X has new

mapping

4. Reboot the compute node

IPASS Cables

Gigabit Ethernet Fabric

Master Node

Compute Nodes

Scripts Using IPMITool

Dell HPC

Demo 1 of 2

Dell HPC

Putting it together- C410X+BMC+IPMItool

Master Node

Compute Nodes

BMC

BMC

“Sandwich

Configurations”

Several configurations

are Possible !!

Dell HPC

Putting it together- C410X+BMC+IPMItool

Master Node

Compute Nodes

64 GPU/32 Node

Configuration

Dell HPC

Possible Combinations There are 25 possible ways 16

GPUs can be mapped to 8 servers in the C410X.

These range from all servers getting 2 GPUs each to two servers getting 8 GPUs each.

S3 S7 S4 S8 S1 S5 S2 S6

8 0 0 0 8 0 0 0

4 0 4 0 8 0 0 0

2 2 4 0 8 0 0 0

4 0 2 2 8 0 0 0

2 2 2 2 8 0 0 0

8 0 0 0 4 0 4 0

4 0 4 0 4 0 4 0

2 2 4 0 4 0 4 0

4 0 2 2 4 0 4 0

2 2 2 2 4 0 4 0

8 0 0 0 2 2 4 0

4 0 4 0 2 2 4 0

2 2 4 0 2 2 4 0

4 0 2 2 2 2 4 0

2 2 2 2 2 2 4 0

8 0 0 0 4 0 2 2

4 0 4 0 4 0 2 2

2 2 4 0 4 0 2 2

4 0 2 2 4 0 2 2

2 2 2 2 4 0 2 2

8 0 0 0 2 2 2 2

4 0 4 0 2 2 2 2

2 2 4 0 2 2 2 2

4 0 2 2 2 2 2 2

2 2 2 2 2 2 2 2

Dell HPC

Use Cases

Dell HPC

Use Cases 1: HPC Data Centers “The number of GPUs a given application requires is different”

– A large number of users submit parallel jobs

– The jobs have a requested number of GPUs/node

– The Job scheduler takes the requests into account to schedule

– The Job scheduler tries to find nodes with the correct number of GPUs

– If such nodes are unavailable it will trigger a dynamic allocation

Job Scheduler

Dell HPC

Use Case 2: HPC Cloud Providers (PaaS)

“The nodes are provisioned with the correct number of GPUs at each instant”

– Users request specific platforms features (number of GPUs, time)

– Provision nodes with required number of GPUs, transfer control to the user

– At the end the GPUs are detached and shared with other nodes

Workload Manager

1. 4 nodes 4 GPU/node for 8 hours







Dell HPC

Demo 2 of 2

Dell HPC

Questions

Dell HPC

Reference

25

IPMI

http://www.intel.com/design/servers/ipmi/ani/index.htm

C410X BMC

http://support.dell.com/support/edocs/SYSTEMS/cp_pe_c410x/en/BMC/BMC.pdf

• Script from Dell.com/support

http://www.dell.com/support/drivers/us/en/19/DriverDetails/DriverFileFormats?c=us&s=dhs&cs=19&l=en&DriverId=R302138












dynamically allocating gpgpu to host nodes (servers) - gtc...

Documents