infiniband meets openshmem · lom adapter card 3.0 from data center to campus and metro...

55
March 2014 InfiniBand Meets OpenSHMEM

Upload: others

Post on 21-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

March 2014

InfiniBand Meets OpenSHMEM

Page 2: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 2

Leading Supplier of End-to-End Interconnect Solutions

Virtual Protocol Interconnect

Storage Front / Back-End

Server / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Virtual Protocol Interconnect

Host/Fabric Software ICs Switches/Gateways Adapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio

Metro / WAN

Page 3: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 3

Virtual Protocol Interconnect (VPI) Technology

64 ports 10GbE

36 ports 40/56GbE

48 10GbE + 12 40/56GbE

36 ports IB up to 56Gb/s

8 VPI subnets

Switch OS Layer

Mezzanine Card

VPI Adapter VPI Switch

Ethernet: 10/40/56 Gb/s

InfiniBand:10/20/40/56 Gb/s

Unified Fabric Manager

Networking Storage Clustering Management

Applications

Acceleration Engines

LOM Adapter Card

3.0

From data center to

campus and metro

connectivity

Standard Protocols of InfiniBand and Ethernet on the Same Wire!

Page 4: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 4

OpenSHMEM / PGAS

Mellanox ScalableHPC Communication Library to Accelerate Applications

MXM • Reliable Messaging

• Hybrid Transport Mechanism

• Efficient Memory Registration

• Receive Side Tag Matching

FCA • Topology Aware Collective Optimization

• Hardware Multicast

• Separate Virtual Fabric for Collectives

• CORE-Direct Hardware Offload

MPI Berkeley UPC

0.0

20.0

40.0

60.0

80.0

100.0

0 500 1000 1500 2000 2500

Late

ncy (

us

)

Processes (PPN=8)

Barrier Collective Latency

Without FCA With FCA

0

500

1000

1500

2000

2500

3000

0 500 1000 1500 2000 2500

Late

ncy (

us

)

Processes (PPN=8)

Reduce Collective Latency

Without FCA With FCA

Page 5: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 5

Overview

Extreme scale programming-model challenges

Challenges for scaling OpenSHMEM to extreme scale

InfiniBand enhancements

• Dynamically Connected Transport (DC)

• Cross-Channel synchronization

• Non-contiguous data transfer

• On Demand Paging

Mellanox ScalableSHMEM

Page 6: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 6

Exascale-Class Computer Platforms – Communication Challenges

Very large functional unit count ~10,000,000

• Implication to communication stack: Need scalable communication capabilities

- Point-to-point

- Collective

Large on-”node” functional unit count ~500

• Implication to communication stack: Scalable HCA architecture

Deeper memory hierarchies

• Implication to communication stack: Cache aware network access

Smaller amounts of memory per functional unit

• Implication to communication stack: Low latency, high b/w capabilities

May have functional unit heterogeneity

• Implication to communication stack: Support for data heterogeneity

Component failures part of “normal” operation

• Implication to communication stack: Resilient and redundant stack

Data movement is expensive

• Implication to communication stack: Optimize data movement

Page 7: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 7

Challenges of a Scalable OpenSHMEM

Scalable Communication interface

• Interface objects scale in less than system dimension

• Not a current issue, but keep in mind moving forward

Support for asynchronous communication

• Non-blocking communication interfaces

• Sparse and topology aware interfaces ?

System noise issues

• Interfaces that support communication delegation (e.g., offloading)

Minimize data motion

• Semantics that support data “aggregation”

Page 8: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 8

Scalable Performance Enhancements

Page 9: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 9

Dynamically Connected Transport

A Scalable Transport

Page 10: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 10

New Transport

Challenges being addressed:

• Scalable communication protocol

• High-performance communication

• Asynchronous communication

Current status: Transports in widest use

• RC

- High Performance: Supports RDMA and Atomic Operations

- Scalability limitations: One connection per destination

• UD

- Scalable: One QP services multiple destinations

- Limited communication support: No support for RDMA and Atomic Operations, unreliable

Need scalable transport that also supports RDMA and Atomic operations DC – The best of both

worlds

• High Performance: Supports RDMA and Atomic Operations, Reliable

• Scalable: One QP services multiple destinations

Page 11: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 11

IB Reliable Transports Model

QoS/Multipathing: 2 to 8 times the above

Resource sharing (XRC/RD) causes processes to impact each-other

Example n=4K p=16 Pro

ce

ss

p-1

n*p

Pro

ce

ss

1

n*p

Pro

ce

ss

0

n*p

n*p*p

(~1M)

RC XRC RD(+XRC)

Pro

ce

ss

p-1

n

Pro

ce

ss

1

n

Pro

ce

ss

0

n

~n*p

(~64K)

Pro

ce

ss

p-1

n

Pro

ce

ss

1

n

Pro

ce

ss

0

~n

(~4K)

Sh

are

d

Co

nte

xts

Page 12: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 12

12

The DC Model

Dynamic Connectivity

Each DC Initiator can be used to reach any remote DC Target

No resources’ sharing between processes

• process controls how many (and can adapt to load)

• process controls usage model (e.g. SQ allocation policy)

• no inter-process dependencies

Resource footprint

• Function of HCA capability

• Independent of system size

Fast Communication Setup Time

cs – concurrency of the sender

cr=concurrency of the responder

Pro

ce

ss

0

cs

p*(

cs+

cr)

/2

cr

Pro

ce

ss

1

cs

cr

Pro

ce

ss

p

-1

cs

cr

Page 13: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 13

Connect-IB – Exascale Scalability

1

1,000

1,000,000

1,000,000,000

InfiniHost, RC 2002 InfiniHost-III, SRQ 2005 ConnectX, XRC 2008 Connect-IB, DCT 2012

8 nodes

2K nodes

10K nodes

100K nodes

Ho

st

Me

mo

ry C

on

su

mp

tio

n (

MB

)

Page 14: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 14

Dynamically Connected Transport

Key objects

• DC Initiator: Initiates data transfer

• DC Target: Handles incoming data

Page 15: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 15

Reliable Connection Transport Mode

Page 16: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 16

Dynamically Connected Transport Mode

Page 17: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 17

Cross-Channel Synchronization

Page 18: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 18

Challenges Being Addressed

Scalable Collective communication

Asynchronous communication

Communication resources (not computational resources) are used to manage communication

Avoids some of the effects of system noise

Page 19: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 19

High Level Objectives

Provide synchronization mechanism between QP’s

Provide mechanisms for managing communication dependencies to be managed without additional

host intervention

Support asynchronous progress of multi-staged communication protocols

Page 20: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 20

Motivating Example - HPC

Collective communications optimization

Communication pattern involving multiple processes

Optimized collectives involve a communicator-wide data-dependent communication pattern, e.g.,

communication initiation is dependent on prior completion of other communication operations

Data needs to be manipulated at intermediate stages of a collective operation (reduction

operations)

Collective operations limit application scalability

• Performance, scalability, and system noise

Page 21: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 21

Scalability of Collective Operations

Ideal Algorithm Impact of System Noise

3

1

2

4

Page 22: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 22

Scalability of Collective Operations - II

Offloaded Algorithm Nonblocking Algorithm

- Communication processing

Page 23: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 23

Network Managed Multi-stage Communication

Key Ideas

• Create a local description of the communication pattern

• Pass the description to the communication subsystem

• Manage the communication operations on the network, freeing the CPU to do meaningful computation

• Check for full-operation completion

Current Assumptions

• Data delivery is detected by new Completion Queue Events

• Use Completion Queue to identify the data source

• Completion order is used to associate data with a specific operation

• Use RDMA with the immediate to generate Completion Queue events

Page 24: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 24

Key New Features

New QP trait - Managed QP: WQE on such a QP must be enabled by WQEs from other QP’s

Synchronization primitives:

• Wait work queue entry: waits until specified completion queue (QP) reaches specified producer index value

• Enable tasks: WQE on one QP can “enable” a WQE on a second QP

Submit lists of task to multiple QP’s in single post - sufficient to describe chained operations (such as

collective communication)

Can setup a special completion queue to monitor list completion (request CQE from the relevant task)

Page 25: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 25

Setting up CORE-Direct QP’s

Create QP with the ability to use the CORE-Direct primitives

Decide if managed QP’s will be used, if so, need to create QP that will take the enable tasks. Most

likely, this is a centralized resource handling both the enable and the wait tasks

Decide on a Completion Queue strategy

Setup all needed QP’s

Page 26: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 26

Initiating CORE-Direct Communication

Task list is created

Target QP for task

Operation send/wait/enable

For wait, the number of completions to wait for

• Number of completions is specified relative to the beginning of the task list

• Number of completions can be positive, zero, or negative (wait on previously posted tasks)

For enable, the number of send tasks on the target QP is specified

• Number of tasks to enable

Page 27: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 27

Posting of Tasks

Page 28: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 28

CORE-Direct Task-List Completion

Can specify which task will generate completion, in “Collective” completion queue

Single CQ signals full list (collective) completion

CPU is not needed for progress

Page 29: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 29

Example – Four Process Recursive Doubling

1 2 3 4

1 2 3 4

1 2 3 4

Step 1

Step 2

Page 30: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 30

Four Process Barrier Example – Using Managed Queues – Rank 0

Page 31: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 31

Four Process Barrier Example – No Managed Queues – Rank 0

Page 32: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 32

User-Mode Memory Registration

Page 33: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 33

Key Features

Support combining contiguous registered memory regions into a single memory region. H/W treats

them as a single contiguous region (and handles the non-contiguous regions)

For a given memory region, supports non-contiguous access to memory, using a regular structure

representation – base pointer, element length, stride, repeat count.

• Can combine these from multiple different memory keys

Memory descriptors are created by posting WQE’s to fill in the memory key

Supports local and remote non-contiguous memory access

• Eliminates the need for some memory copies

Page 34: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 34

Combining Contiguous Memory Regions

Page 35: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 35

Non-Contiguous Memory Access – Regular Access

Page 36: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 36

On-Demand Paging

Page 37: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 37

Memory Registration

Apps register Memory Regions (MRs) for IO

• Referenced memory must be part of process address

space at registration time

• Memory key returned to identify the MR

Registration operation

• Pins down the MR

• Hands off the virtual to physical mapping to HW

Ap

plic

atio

n

User-

sp

ace

driv

er

sta

ck

HW

MR

SQ

CQ

Key

RQ

Kern

el

driv

er

sta

ck

Page 38: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 38

Memory Registration – continued

Fast path

• Applications post IO operations directly to HCA

• HCA accesses memory using the translations

referenced by the memory key

Ap

plic

atio

n

User-

sp

ace

driv

er

sta

ck

HW

MR

SQ

CQ

RQ

Kern

el

driv

er

sta

ck

Key

Wow !!! But…

Page 39: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 39

Challenges

Size of registered memory must fit physical memory

Applications must have memory locking privileges

Continuously synchronizing the translation tables between the address space and the HCA is hard

• Address space changes (malloc, mmap, stack)

• NUMA migration

• fork()

Registration is a costly operation

• No locality of reference

Page 40: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 40

Achieving High Performance

Requires careful design

Dynamic registration

• Naïve approach induces significant overheads

• Pin-down cache logic is complex and not complete

Pinned bounce buffers

• Application level memory management

• Copying adds overhead

Page 41: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 41

On Demand Paging

MR pages are never pinned by the OS

• Paged in when HCA needs them

• Paged out when reclaimed by the OS

HCA translation tables may contain non-present pages

• Initially, a new MR is created with non-present pages

• Virtual memory mappings don’t necessarily exist

Page 42: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 42

Semantics

ODP memory registration

• Specify IBV_ACCESS_ON_DEMAND access flag

Work request processing

• WQEs in HW ownership must reference mapped memory

- From Post_Send()/Recv() until PollCQ()

• RDMA operations must target mapped memory

• Access attempts to unmapped memory trigger an error

Transport

• RC semantics unchanged

• UD responder drops packets while page faults are resolved

- Standard semantics cannot be achieved unless wire is back-pressured

Page 43: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 43

Execution Time Breakdown (Send Requestor)

2%

36%

4%

5% 16%

37%

0%

Schedule in

WQE read

Get User Pages

PTE update

TLB flush

QP resume

Misc

0%

7%

83%

1% 2%

7%

0%

Schedule in

WQE read

Get User Pages

PTE update

TBL flush

QP resume

Misc

4K Page fault (135us total time) 4M Page fault (1ms total time)

Page 44: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 44

Mellanox ScalableSHMEM

Page 45: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 45

Mellanox’s OpenSHMEM Implementation

Implemented within the context of the Open MPI project

Exploit’s Open MPI’s component architecture, re-using components used for the MPI

implementation

Adds OpenSHMEM specific components

Uses InifiniBand optimized point-to-point and Collective communication modules

• Hardware Multicast

• CORE-Direct

• Dynamically Connected Transport

• Enhanced Hardware scatter/gather (UMR) – coming soon

• On Demand Paging – coming soon

Page 46: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 46

Leveraging the work already done for MPI implementation

Similarities

• Atomics, collectives operations

• one-sided operations (put/get)

• Job start and runtime support (mapping/binding/…)

Differences

• SPMD

• No communicators (yet)

• No user-defined data types

• Limited set of collectives, and slightly different semantics (Barrier)

• Application can put/get data from pre-allocated heap or static variables

• No file I/O support

Page 47: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 47

OMPI + OSHMEM

Many OMPI frameworks reused (runtime, platform support, job start,

BTL, BML, MTL, profiling, autotools)

OSHMEM specific frameworks added, keeping MCA plugin

architecture (SCOLL, SPML, atomics, synchronization and ordering

enforcement)

OSHMEM supports Mellanox p2p and collectives accelerators (MXM,

FCA) as long as OMPI provided transports (TCP, OpenIB, Portals, …)

Page 48: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 48

SHMEM Implementation Architecture

Page 49: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 49

SHMEM Put Latency

1.4

1.5

1.6

1.7

1.8

1.9

2

1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Late

ncy (

usec)

Data Size (bytes)

ConnectX-3 RC

Connect-IB RC

ConnectX-3 UD

Connect-IB UD

Page 50: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 50

SHMEM Put Latency

0

20

40

60

80

100

120

140

160

180

200

8192 16384 32768 65536 131072 262144 524288 1048576

Late

ncy (

usec)

Data Size (bytes)

ConnectX-3 RC

Connect-IB RC

ConnectX-3 UD

Connect-IB UD

Page 51: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 51

‘SHMEM 8 Byte Message Rate

0

10

20

30

40

50

60

70

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192

Messag

e R

ate

(M

MP

S)

Data Size (bytes)

ConnectX-3 RC

Connect-IB RC

ConnectX-3 UD

Connect-IB UD

Page 52: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 52

‘SHMEM Barrier - Latency

0

2

4

6

8

10

12

14

16

18

20

2 4 8 16 32 48 63

Late

ncy (

usec)

Number of Hosts (16 Processes per Node)

ConnectX-3

Connect-IB

Page 53: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 53

SHMEM All-to-All Benchmark (Message Size – 4096 Bytes)

0

50

100

150

200

250

300

350

400

8 16 32 48 63

Ban

dw

idth

- M

Bp

s p

er

PE

Number of Hosts (16 PE's per Node)

CionnectX-3

Connect-IB

Page 54: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

© 2014 Mellanox Technologies 54

SHMEM Atomic Updates

0

1

2

3

4

5

6

Millio

n O

ps p

er

seco

nd

Connect-IB UD

ConnectX-3 UD

Page 55: InfiniBand Meets OpenSHMEM · LOM Adapter Card 3.0 From data center to campus and metro connectivity ... •Interface objects scale in less than system dimension ... •Data delivery

For more information: [email protected]