sc’16 intel opa lnet update · intel® omni-path, ethernet, infiniband*, elan, myrinet*, etc....

19
Intel Confidential — Do Not Forward SC’16 Intel OPA LNET Update

Upload: others

Post on 10-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Intel Confidential — Do Not Forward

SC’16 Intel OPA LNET Update

Page 2: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Intel Confidential — Do Not Forward

LNET Intel OPA

2

Page 3: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

LNET

Designed to meet the needs of large-scale computing clusters

Optimized for very large node counts, high throughput

Works with most networks, supports RDMA

Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc.

LNET is independent of the Lustre file system

Abstracts network details from Lustre

Implemented as a set of kernel modules

3

Page 4: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

LNET (continued)

Networks are given unique names

o2ib0, tcp0, tcp1

Lustre Network Identifier (NID) defines interfaces

10.1.145.16@o2ib0

Includes native support for multiple networks

Accomplished via the Lustre Network Driver (LND)

Infiniband via o2ib verbs interface, with RDMA support

Ethernet via TCP/IP interface

Lustre -> Network RPC API -> LNET -> LND -> Linux Driver

4

Page 5: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Intel® Omni-Path Architecture

Building on the industry’s best technologies Highly leverage existing Aries and Intel®

True Scale fabric

Adds innovative new features and capabilities to improve performance, reliability, and QoS

Re-use of existing OpenFabricsAlliance* software

5

4

Robust product offerings and ecosystem End-to-end Intel product line

Strong ecosystem with 70+ Fabric Builders members

Software

Open SourceHost Software and

Fabric Manager

HFI Adapters

Single portx8 and x16

x8 Adapter(58 Gb/s)

x16 Adapter

(100 Gb/s)

Edge Switches

1U Form Factor24 and 48 port

24-portEdge Switch

48-portEdge Switch

Director Switches

QSFP-based192 and 768 port

192-portDirector Switch

(7U chassis)

768-portDirector Switch

(20U chassis)

Cables

Third Party VendorsPassive Copper Active Optical

Silicon

OEM custom designsHFI and Switch ASICs

Switch siliconup to 48 ports

(1200 GB/stotal b/w

HFI siliconUp to 2 ports

(50 GB/s total b/w)

Page 6: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

LNET Intel® OPA Considerations

Base OS Support

Red Hat Enterprise Linux 7

SUSE Linux Enterprise Server 12

Lustre 2.7+ Required for OS Support Server Side

Intel Fabric Suite (IFS) delivers Driver UPDATES

IFS updates Base OS OFED components only as required

Enables the use of other “In Kernel” drivers concurrently

6

Page 7: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

LNET Intel® OPA Considerations (continued)

Intel® OPA gen 1 Supports RDMA Verbs in OFED

LNET uses the existing LND Infiniband* Driver

Only LND and Driver TUNING required for operation

Automated LND settings at LNET install time

7

Page 8: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

8

Intel OPA Development In Lustre

New ko2iblnd-opa driver

Intel OPA abstraction layer for Lnet

ko2iblnd-opa default settings

options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1

Allow the use of 2 different o2ib devices

Uses same LND driver

Different settings for IB and Intel OPA

Configure via Dynamic Lnet Configuration

Page 9: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Integration into Existing Fabrics

Challenge : Intel® OPA is not directly compatible with Infinband* (IB). Intel® OPA cannot plug into IB switch

Solution: LNET Routers

9

OPA Lustre Components

LNetRouters

IB Lustre ComponetsOPA Inifiband

Page 10: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

LNET Routers Overview

LNET Routers: Lustre Software + Standard Hardware

Use off the shelf Hardware

Software is apart of standard LNET/Lustre

Clustered Deployment Recommended

Supported Configurations with Redhat* 7.2 and Lustre 2.7+

Intel® OPA -> Ethernet

Intel® OPA -> FDR (use in kernel drivers for IB)

Intel® OPA -> EDR (use in kernel drivers for IB)

See Intel® Enterprise Edition for Lustre* software Configuration For LNET Routers

10

Page 11: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Intel Confidential — Do Not Forward

Lnet Routing

11

Page 12: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

12

LNet Routing

Clients and servers are endpoints, not routers

• Multi-homed servers can have multiple networks – Dual-rail, etc.

• Routers should be dedicated nodes with multiple networks

Module parameters identify routers

• All routing setups must be bi-directional, very much like other routing setups

Initial route decision is based on destination NID

• NID on a local network, send directly

• NID on a remote network, consult routing table

LNet routing table is in /proc

Routing decisions are based on hop count

• If there is a pool of routers, message is sent to the router with the shortest queue

Intel Confidential — Do Not Forward

Page 13: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

13

LNet Routing (cont)

LNet Routers often connect different hardware (IB to Intel OPA, etc.)

• Dedicated hardware tends to be expensive

• Can run on any node(s) with both network interfaces

• LNet routers only route LNet traffic

Routers are fairly simple

• Have connections to more than one LNet network

• Have forwarding enabled, to forward traffic between LNet's

Routing includes a health check

• Disabled by default, but always best to enable

• Router checker can revive dead routers

Watch LNet router statistics

• Use the /usr/sbin/routerstat command

Intel Confidential — Do Not Forward

Page 14: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

14

LNet Routing - Configuration

Example using networks:

• Servers on LAN1 – 10.10.0.0/24

• Clients on LAN2 – 10.20.0.0/24

• Router on LAN1 and LAN2 at 10.10.0.20 and 10.20.0.29

Servers:

options lnet networks="tcp1(eth1)" route="tcp2 10.10.0.20@tcp1"

Router:

options lnet networks="tcp1(eth1), tcp2(eth2)" "forwarding=enabled"

Clients:

options lnet networks="tcp2(eth1)" routes="tcp1 10.20.0.29@tcp2"

Print the configured routes:

# lctl route_list

Intel Confidential — Do Not Forward

Page 15: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

15

LNet Routers – Pooling of Routers

Routers support a ‘pool’ model

• Routers discover each other, function as a pool (cluster)

• Monitor peer health and communicating state

• Will route traffic around failed peer

• Load balancing overall load across multiple routers

Router pooling does add some complexity versus a single router

• Routers feed back state information to the client

• Clients process the state of each router in the pool

• Use data to load balance traffic across the entire pool of routers

Router pooling is easy to configure

• Clients are configured to know NIDs of all the routers

Intel Confidential — Do Not Forward

Page 16: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

16

LNet Routers – Pool Configuration

Example configuration using networks:

• Servers on LAN1 – 10.10.0.0/24

• Clients on LAN2 – 10.20.0.0/24

• Routers on LAN1 and LAN2 at 10.10.0.20-29 and 10.20.0.20-29

Servers:

options lnet networks="tcp1(eth1)" route="tcp2 10.10.0.[20-29]@tcp1"

Routers:

options lnet networks="tcp1(eth1), tcp2(eth2)" "forwarding=enabled"

Clients:

options lnet networks="tcp2(eth1)" routes="tcp1 10.20.0.[20-29]@tcp2"

Intel Confidential — Do Not Forward

Page 17: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

17

LNet Routing – More Configurations

Configure "Router Checker" options on "clients":

options lnet networks="tcp2(eth2)" \

auto_down=1 \

live_router_check_interval=60 \

dead_router_check_interval=60 \

check_routers_before_use=1 \

forwarding=disabled \

accept=none

Intel Confidential — Do Not Forward

Page 18: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Wrap up

Lustre Intel® OPA is in production Today

LNET Routers provided Flexible Deployment Options

Learn More

www.intel.com/Lustre

18

Page 19: SC’16 Intel OPA LNET Update · Intel® Omni-Path, Ethernet, Infiniband*, ELAN, Myrinet*, etc. LNET is independent of the Lustre file system Abstracts network details from Lustre

Intel Confidential — Do Not Forward