mellanox technologies maximize world-class cluster performance

18
CONFIDENTIAL April, 2008 Mellanox Technologies Maximize World-Class Cluster Performance Shainer – Director of Technical Marketing

Upload: della

Post on 14-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

Mellanox Technologies Maximize World-Class Cluster Performance. April, 2008. Gilad Shainer – Director of Technical Marketing. Mellanox Technologies. Fabless semiconductor supplier founded in 1999 Business in California R&D and Operations in Israel Global sales offices and support - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mellanox Technologies Maximize World-Class Cluster Performance

CONFIDENTIAL

April, 2008

Mellanox TechnologiesMaximize World-Class Cluster Performance

Gilad Shainer – Director of Technical Marketing

Page 2: Mellanox Technologies Maximize World-Class Cluster Performance

2 Mellanox Confidential

Mellanox Technologies

Fabless semiconductor supplier founded in 1999 • Business in California • R&D and Operations in Israel • Global sales offices and support• 250+ employees worldwide

Leading server and storage interconnect products• InfiniBand and Ethernet leadership

Shipped over 2.8M 10 & 20Gb/s ports as of Dec 2007 $106M raised in Feb 2007 IPO on NASDAQ (MLNX)

• Dual Listed on Tel Aviv Stock Exchange (TASE: MLNX)• Profitable since 2005

Revenues: FY06=$48.5M, FY07=$84.1M• 73% yearly growth• 1Q08 guidance ~$24.8M

Customers include Cisco, Dawning, Dell, Fujitsu, Fujitsu-Siemens, HP, IBM, NEC, NetApp, Sun, Voltaire

Page 3: Mellanox Technologies Maximize World-Class Cluster Performance

3 Mellanox Confidential

Interconnect: A Competitive Advantage

Adapter ICs & Cards Switch ICs Cables

Reference Designs Software End-to-End Validation

Blade/Rack Servers StorageSwitch

ADAPTER ADAPTERSWITCH

Providing end-to-end products

Page 4: Mellanox Technologies Maximize World-Class Cluster Performance

4 Mellanox Confidential

InfiniBand in the TOP500

Number of Clusters in Top500

InfiniBand shows the highest yearly growth of 52% compared to Nov 06• InfiniBand strengthens its leadership as high-speed interconnect of choice• 4+ times the total number of all other high-speed interconnects (1Gb/s+)

InfiniBand 20Gb/s connects 40% of the InfiniBand clusters• Reflects the ever growing performance demands

InfiniBand makes the most powerful clusters• 3 of the top 5 (#3,#4, #5) and 7 of the Top20 (#14, #16, #18, #20)• The leading interconnect for the Top100

Page 5: Mellanox Technologies Maximize World-Class Cluster Performance

5 Mellanox Confidential

InfiniBand Clusters - CPU Count

0

50000

100000

150000

200000

250000

300000

350000

400000

Nov 05 June 06 Nov 06 June 07 Nov 07

Nu

mb

er

of

CP

Us

InfiniBand CPUs

InfiniBand Clusters - Performance

0

500000

1000000

1500000

2000000

2500000

Nov 05 June 06 Nov 06 June 07 Nov 07

Pe

rfo

rma

ce

(G

flo

ps

)

InfiniBand Performance

TOP500 - InfiniBand Performance Trends

180% CAGR 220% CAGR

InfiniBand In The Top500

0

20

40

60

80

100

120

140

Nov 05 Nov 06 Nov 07

Nu

mb

er

of

Clu

ste

rs

IB SDR IB DDR

Ever growing demand for compute resources

Explosive growth of IB 20Gb/s IB 40Gb/s anticipated in Nov08 list

InfiniBand the optimized Interconnects for multi-core environments Maximum Scalability, Efficiency and Performance

Page 6: Mellanox Technologies Maximize World-Class Cluster Performance

6 Mellanox Confidential

ConnectX: Leading IB and 10GigE Adapters

InfiniBand Switch Ethernet Switch

10/20/40 Gb/s IB Adapter 10GigE Adapter

10/20/40 Gb/s

PCIe Gen 1/2

Dual Port

Hardware IOV

Dual Port

PCIe Gen1/2

Hardware IOV

FCoE, CEE

Server and Storage Interconnect • Highest InfiniBand and 10GigE performance

Single chip or slot optimizes cost, power, footprint, reliability One device for 40Gb/s IB, FCoIB, 10GigE, CEE, FCoE

• One SW stack for offload, virtualization, RDMA, storage

Page 7: Mellanox Technologies Maximize World-Class Cluster Performance

7 Mellanox Confidential

ConnectX Multi-core MPI Scalability

Mellanox ConnectX MPI Latency - Multi-core Scaling

0

5

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# of CPU cores (parallel processes)L

ate

nc

y (

us

ec

)

ConnectX IB QDR versus DDR Bandwidth (PCIe Gen2)

0500

100015002000250030003500400045005000550060006500

2 8 32 128

512

2048

8192

3276

8

1310

72

5242

88

2097

152

Bytes

MB

/s

IB Write BW DDR PCIe Gen2 IB Write BiBW DDR PCIe Gen2

IB Write BW QDR PCIe Gen2 IB Write BiBW QDR PCIe Gen2

Scalability to 64+ cores per node Scalability to 20K+ nodes per cluster Guarantees same low latency regardless of the number of cores Guarantees linear scalability for real applications

Page 8: Mellanox Technologies Maximize World-Class Cluster Performance

8 Mellanox Confidential

ConnectX EN 10GigE Performance

World leading 10GigE performance TCP, UDP, CPU utilization

Performance testing on same HW platformBandwidth with MTU of 9600B, CPU utilization for Rx

UDP Bandwidth Comparison

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 4 8 16 32 64 128 256

Message Size (Bytes)

Ba

nd

wid

th (

Gb

/s)

Vendor A Vendor B Mellanox ConnectX EN

10GigE TCP CPU Utilization (%)

0

5

10

15

20

25

30

35

64 128 256 512 1024 2048 4096 8192 16384 32768

Bytes

%

Vendor A Vendor A - TOE Vendor B Mellanox ConnectX EN

10GigE TCP Bandwidth (Gb/s)

0

2

4

6

8

10

12

64 128 256 512 1024 2048 4096 8192 16384 32768

Bytes

Ba

nd

wid

th (

Gb

/s)

Vendor A Vendor A - TOE Vendor B Mellanox ConnectX EN

Page 9: Mellanox Technologies Maximize World-Class Cluster Performance

9 Mellanox Confidential

~3 terabits per second of

switching capacity in a single silicon

device!

InfiniScale IV: Unprecedented Scalability

Up to 36 40Gb/s or 12 120Gb/s InfiniBand Ports 60-70ns switching latency Adaptive routing and congestion control Systems available latter part of 2008

Page 10: Mellanox Technologies Maximize World-Class Cluster Performance

10 Mellanox Confidential

InfiniBand Address the Needs for Petascale Computing

Balanced random network streaming• “One to One” random streaming• Solution: Dynamic routing (InfiniScale IV)

Balanced known network streaming• “One to One” known streaming• Solution: Static routing (Now)

Un-balanced network streaming• “Many to one” streaming• Solution: Congestion control (Now)

Faster network streaming propagation• Network speed capabilities• Solution: InfiniBand QDR (InfiniScale IV)

40/80/120Gb/s

IB designed to handle all communications in HW

Page 11: Mellanox Technologies Maximize World-Class Cluster Performance

11 Mellanox Confidential

Hardware Congestion Control

Congestion spots catastrophic loss of throughput• Old techniques are not adequate today

Over-provisioning – applications demands high throughput Algorithmic predictability – virtualization drives multiple

simultaneously algorithms InfiniBand HW congestion control

• No a priori network assumptions needed• Automatic hot spots discovery• Data traffics adjustments• No bandwidth oscillation or other stability side effects

Ensures maximum effective bandwidth

Simulation results32-port 3 stage fat-tree network

High input load, large hotpot degree

“Solving Hot Spot Contention Using InfiniBand Architecture Congestion ControlIBM Research; IBM Systems and Technology Group; Technical University of Valencia, Spain

Before congestion control

After congestion control

Switch

HCA HCA

IPDIndex

Timer

Packet Packet

BECN

FECN

BECN+

- Switch

HCA HCA

IPDIndex

Timer

Packet Packet

BECN

FECN

BECN+

-

Page 12: Mellanox Technologies Maximize World-Class Cluster Performance

12 Mellanox Confidential

Simulation model (Mellanox): 972 nodes cases, Hot Spot traffic

Hot Spot Traffic - Average Performance

0.00E+00

2.00E+08

4.00E+08

6.00E+08

8.00E+08

1.00E+09

0 50 100 150 200 250 300

Message Size (kB)

Ave

rag

e O

ffer

ed L

oad

(B

ps)

Static - 972Nodes Adaptive - 972Nodes

InfiniBand Adaptive Routing

Fast path modifications No overhead throughput Maximum flexibility for routing

algorithms• Random based decision• Least loaded based decision• Greedy Random based solution

Least loaded out of random set

Maximize “One to One” random traffic network efficiency Dynamically re-routes traffic to alleviate congested ports

Page 13: Mellanox Technologies Maximize World-Class Cluster Performance

13 Mellanox Confidential

InfiniBand QDR 40Gb/s Technology

Superior performance for HPC applications• Highest bandwidth, 1us node-to-node latency• Low CPU overhead, MPI offloads• Designed for current and future multi-core environments

Addresses the needs for Petascale Computing• Adaptive routing, congestion control, large-scale switching• Fast network streaming propagation

Consolidated I/O for server and storage• Optimize cost and reduce power consumption

Maximizing cluster productivity• Efficiency, scalability and reliability

Page 14: Mellanox Technologies Maximize World-Class Cluster Performance

14 Mellanox Confidential

Commercial HPC Demands High Bandwidth

4-node InfiniBand cluster demonstrates higher performance versus any cluster size with GigE

LS-DYNA Productivity

11%

1034%

199%

39%

89%

0

20

40

60

80

100

120

140

160

180

200

8 16 20 32 64

Number of cores

Jo

bs

pe

r d

ay

0%

150%

300%

450%

600%

750%

900%

1050%

1200%

1350%

1500%

% d

iffe

ren

ce

be

twe

en

IB

an

d G

igE

InfiniBand GigE % Difference

LS-DYNA Profiling

110

1001000

10000100000

100000010000000

1000000001000000000

10000000000

MPI message size (Byte)

To

tal d

ata

tra

ns

ferr

ed

(B

yte

)

16 cores 32 cores

Scalability mandates InfiniBand QDR and beyond

Page 15: Mellanox Technologies Maximize World-Class Cluster Performance

15 Mellanox Confidential

Mellanox Cluster Center

http://www.mellanox.com/applications/clustercenter.php

Neptune cluster • 32 nodes • Dual core AMD Opteron CPUs

Helios cluster • 32 nodes• Quad core Intel Clovertown CPUs

Vulcan cluster • 32 nodes• Quad core AMD Barcelona CPUs

Utilizing “Fat Tree” network architecture (CBB)• Non-blocking switch topology • Non-blocking bandwidth

InfiniBand 20Gb/s (40Gb/s May 2008) InfiniBand based storage

• NFS over RDMA, SRP• GlusterFS cluster file system (Z Research)

Page 16: Mellanox Technologies Maximize World-Class Cluster Performance

16 Mellanox Confidential

Summary

Market-wide adoption of InfiniBand • Servers/blades, storage and switch systems• Data Centers, High-Performance Computing, Embedded• Performance, Price, Power, Reliable, Efficient, Scalable• Mature software ecosystem

4th Generation adapter extends connectivity to IB and Eth• Market leading performance, capabilities and flexibility• Multiple 1000+ node clusters already deployed

4th Generation switch available May08• 40Gb/s server and storage connections, 120Gb/s switch to switch links• 60-70nsec latency, 36 ports in a single switch chip, 3 Terabits/second

Driving key trends in the market• Clustering/blades, low-latency, I/O consolidation, multi-core CPUs and

virtualization

Page 17: Mellanox Technologies Maximize World-Class Cluster Performance

CONFIDENTIAL

Thank You

17

Gilad Shainer [email protected]

Page 18: Mellanox Technologies Maximize World-Class Cluster Performance

18 Mellanox Confidential

Accelerating HPC Applications

PAM-CRASH

0

5000

10000

15000

20000

25000

30000

35000

40000

16 CPUs 32 CPUs 64 CPUs

Ela

ps

ed

Tim

e (

se

c)

GigE InfiniBandLower is better

LS-DYNA neon_refined_revised

50

100

150

8 16

Number of Servers

Ra

tin

g

Mellanox InfiniHost III Ex SDR QLogic InfiniPath SDR

100

120

140

160

180

200

Tim

e (s

ec)

Vector Distribution Benchmark

GigE InfiniBandLower is better

57%

100

120

140

160

180

200

Tim

e (s

ec)

Vector Distribution Benchmark

GigE InfiniBandLower is better

57%

Schlumberger ECLIPSE

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

16 32 64

# of Servers

Ela

ps

ed

Ru

nti

me

(S

ec

on

ds

)

Myrinet InfiniBand

Lower is better