infiniband in the data center

23
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 1 Infiniband in the Data Center Steven Carter Cisco Systems [email protected] Makia Minich, Nageswara Rao Oak Ridge National Laboratory {minich,rao}@ornl.gov

Upload: gloria

Post on 14-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Infiniband in the Data Center. Steven Carter Cisco Systems [email protected]. Makia Minich, Nageswara Rao Oak Ridge National Laboratory {minich,rao}@ornl.gov. Agenda. Overview The Good, The Bad, and The Ugly - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 1

Infiniband in the Data Center

Steven CarterCisco Systems

[email protected]

Makia Minich, Nageswara RaoOak Ridge National Laboratory

{minich,rao}@ornl.gov

Page 2: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 2

Agenda

Overview

The Good, The Bad, and The Ugly

IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences

IB WAN Case Study: Department of Energy’s UltraScience Network

Page 3: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 3

Overview

Data movement requirements are exploding in HPC data centers.

The requirement to move 100’s of GB/s around the data center requires something more than is being provided in the Ethernet community

TCP/IP performs poorly in the wide-area on the high-bandwidth links required to move data between data centers

This is a high-level overview of the pros and cons of using Infiniband in the Data Center and two case studies to reinforce them

Page 4: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 4

Agenda

Overview

The Good, The Bad, and The Ugly

IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences

IB WAN Case Study: Department of Energy’s UltraScience Network

Page 5: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 5

The Good

Cool Name (Marketing gets an A+ -- who doesn’t want infinite bandwidth?)

Unified Fabric/IO Virtualization:

– Low-latency interconnect - nanoseconds, not low milliseconds - not necessarily important in a data center

– Storage – Using SRP (SCSI RDMA Protocol) or iSER (iSCSI Extension for RDMA)

– IP – Using IPoIB, newer versions run over Connected Mode giving better throughput

– Gateways – Gateways give access to legacy Ethernet (careful) and Fibre Channel networks

Page 6: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 6

The Good (Cont.)

Faster link speeds:

– 1x Single Data Rate (SDR) = 2.5 Gb/s (2 Gb/s with 8b/10b signalling)

– 4 1x links can be aggregated into a single 4x link

– 3 4x links can be aggregated into a single 12x link (single 12x link also available)

– Double Data Rate (DDR) currently available, Octo Data Rate (ODR) on the horizon

– Many link speeds available: 8Gb/s, 16Gb/s, 24 Gb/s, 32Gb/s, 48 Gb/s, etc.

Page 7: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 7

The Good (Cont.)

HCA does much of the heavy lifting:

– Much of the protocol is done on the Host Channel Adapter (HCA) freeing the CPU

– Remote Direct Memory Access (RDMA) gives the ability to transfer data between hosts with very little CPU overhead

– RDMA capability is EXTREMELY important because it provides significantly greater capability from the same hardware

Page 8: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 8

The Good (Cont.)

Nearly 10x less cost for similar bandwidth:

– Because of its simplicity, IB switches cost less. Oddly enough, IB HCAs are more complex than 10G NICs, but are also less expensive.

– Roughly $500 per port in the switch and $500 for a dual port DDR HCA

– Because of RDMA, there is a cost savings in infrastructure as well (i.e. you can do more with fewer hosts)

Higher port density switches:

– Switches available with 288 (or more) full-rate ports in a single chassis

Page 9: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 9

The Bad

IB sounds too much like IP (Can quickly degrade into a “Who’s on first” routine)

IB is not well understood by networking folks

Lacks some of the features of Ethernet important in the Data Center:

– Router – no way to natively connect two separate fabrics - The IB Subnet Manager (SM) is integral to the operation of the network (detects hosts, programs routes into the switch, etc). Without a router, you cannot have two different SMs for different operational or administrative domains (Can be worked around at the application layer).

– Firewall – No way to dictate who talks to whom by protocol (partitions exist, but are too course grained)

– Protocol Analyzers - They exist but are hard to come by, difficult to “roll your own” because of the protocol is embedded in the HCA

Page 10: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 10

The Ugly

Cabling options:

•Heavy gage cables with clunky CX4 connectors

•Short distance (< 20 meters)

•If mishandled, they have a propensity to fail

•Heavy connectors can become disengaged

•Electrical to optical converter

•Long distance (up to 150 meters)

•Uses multi-core ribbon fiber (hard to debug)

•Expensive

•Heavy connectors can become disengaged

Page 11: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 11

The Ugly (Continued)

Cabling options:•Electrical to optical converter built on the cable

•Long distance (up to 100 meters)

•Uses multi-core ribbon fiber (hard to debug)

•More cost effective than other solutions

•Heavy connectors can become disengaged

Page 12: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 12

Agenda

Overview

The Good, The Bad, and The Ugly

IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences

IB WAN Case Study: Department of Energy’s UltraScience Network

Page 13: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 13

Case Study: ORNL Center for Computational Sciences (CCS)

The Department of Energy established the Leadership Computing Facility at ORNL’s Center for Computational Sciences to field a 1PF supercomputer

The design chosen, the Cray XT series, includes an internal Lustre filesystem capable of sustaining reads and writes of 240GB/s

The problem with making the filesystem part of the machine is that it limits the flexibly of the Lustre filesystem and increases the complexity of the Cray

The problem with decoupling the filesystem from the machine is the high cost involved with to connect it via 10GE at the required speeds

Page 14: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 14

Ethernet[O(10GB/s)]

Ethernet core scaled to match wide-area

connectivity and archive

Infiniband core scaled to match central file system

and data transfer

CCS IB Network Roadmap Summary

Viz

High-PerformanceStorage System

(HPSS)

Jaguar

Lustre

Baker

Infiniband[O(100GB/s)]

Gateway

Page 15: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 15

Spider(Linux Cluster)

Rizzo (XT3)

Voltaire 9024

• ORNL showed the first successful infiniband implementation on the XT3• Using Infiniband in the XT3’s I/O nodes running a Lustre Router resulted in a > 50% improvement in performance and a significant decrease in CPU utilization

XT3 LAN Testing

Page 16: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 16

Observations

XT3's performance is good (better than 10GE) for RDMA

XT3's poor performance compared to the generic X86_64 host likely a result of PCI-X HCA (known to be sub-optimal)

In its role as a Lustre router, IB allows significantly better performance per I/O node allowing CCS to achieve the required throughput with fewer nodes than would be needed using 10GE

Page 17: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 17

Agenda

Overview

The Good, The Bad, and The Ugly

IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences

IB WAN Case Study: Department of Energy’s UltraScience Network

Page 18: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 18

IB over WAN testing

Placed 2 x Obsidian Longbow devices between Voltaire 9024 and Voltaire 9288

Provisioned loopback circuits of various lengths on the DOE UltraScience Network and ran test.

RDMA Test Results:Local: 7.5Gbps (Longbow to Longbow)ORNL <-> ORNL (0.2mile): 7.5GbpsORNL <-> Chicago (1400miles): 7.46GbpsORNL <-> Seattle (6600 miles): 7.23GbpsORNL <-> Sunnyvale (8600 miles):

7.2Gbps20 81

ObsidianLongbow

CienaCD-CI(SNV)

4x Infiniband SDR

OC-192 SONET

Cluster End-to-End

CienaCD-CI

(ORNL)

ObsidianLongbow

Voltaire9024

Voltaire9288

DOE UltraScience

Network

Page 19: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 19

Sunnyvale loopback (8600 miles) – RC problem

Page 20: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 20

Observations

The Obsidian Longbows appear to be extending sufficient link-level credits

Native IB transports does not appear to suffer from the same wide-area shortcomings as TCP (i.e. Full rate with no tuning)

With the Arbel based HCAs, we saw problems:

– RC only performs well at large messages sizes

– There seems to be a maximum number of messages allowed in flight (~250)

– RC performance does not increase rapidly enough even when message cap is not an issue

The problems seem to be fixed with the new Hermon-based HCAs…

Page 21: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 21

Obsidian’s Results – Arbel vs. Hermon

Arbel to Hermon Hermon to Arbel

Page 22: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 22

Summary

Infiniband has the potential to make a great data center interconnect because it provides a unified fabric, faster link speeds, mature RDMA implementation, and lower cost

There does not appear to be the same intrinsic problem with IB in the wide-area as there is with IP/Ethernet, making IB a good candidate to transfer data between data centers

Page 23: Infiniband in the Data Center

© 2007 Cisco Systems, Inc. All rights reserved. Cisco PublicInfiniband in the Data Center 23

The End

Questions? Comments? Criticisms?

For more information:

Steven CarterCisco Systems

[email protected]

Makia Minich, Nageswara RaoOak Ridge National Laboratory{minich,rao}@ornl.gov