cluster networks - dcs.warwick.ac.uk

33
1 Computer Science, University of Warwick Computer Science, University of Warwick Cluster Networks Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance… An increase in compute power typically demands proportional increases in lower latency / higher bandwidth communication services.

Upload: others

Post on 07-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

1Computer Science, University of WarwickComputer Science, University of Warwick

Cluster NetworksCluster Networks

IntroductionCommunication has significant impact on application performance.

Interconnection networks therefore have a vital role in cluster systems.

As usual, the driver is performance…An increase in compute power typically demands proportional increases in lower latency / higher bandwidth communication services.

2Computer Science, University of WarwickComputer Science, University of Warwick

Cluster NetworksCluster Networks

Issues with cluster interconnections are similar to those with normal networks:

Latency & Bandwidth

Topology type (bus, ring, torus, hypercube etc).

Routing

Direct connections (point-to-point) or indirect connections.

NIC (Network Interface Card) capabilities.

Physical medium (Twisted pair, fibre optic)

Balance between performance and cost

3Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

In standard LANs we have two general structures:Shared network (bus)

• All messages are broadcast… each processor listens to every message.

• Requires complex access control (e.g. CSMA/CD).

CSMA/CD: Carriers Sense Multiple Access with Collision Detection

• Collisions can occur: requires back-off policies and retransmissions.

• Suitable when the offered load is low - inappropriate for high performance applications.

• Very little reason to use this form of network today.

Switched network

• Permits point-to-point communications between sender & receiver.

• Fast internal transport provides high aggregate bandwidth.

• Multiple messages are sent simultaneously.

4Computer Science, University of WarwickComputer Science, University of Warwick

Metrics to evaluate network topologyMetrics to evaluate network topology

Useful metrics for switched network topology:Scalability : the network’s switch scalability with nodes.

Degree: number of links to / from a node

measure aggregate bandwidth

Diameter: the shortest path between the furthest nodes.

measure latency

Bisection width: the minimum number of links that must be cut in order to divide the topology into two independent networks ofthe same size (+/- one node). Essentially a measure of bottleneck bandwidth - if higher, the network will perform better under load.

5Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Crossbar switch:Low latency and high throughput.

Switch scalability is poor - O(N2)

Lots of wiring…

6Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Linear Arrays and RingsConsider networks with switch scaling costs better than O(N2).

In one dimension, we have simple linear arrays.

O(N) switches.

These can wrap around to make a ring or 1D torus.

latency is high.

2D/3D Cartesian applications will perform poorly with this network.

2D 3D

1Computer Science, University of WarwickComputer Science, University of Warwick

7Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

2D MeshesCan wrap-around as a 2D torus.

Switch scaling: O(N)

Average degree: 4

Diameter: O(2n1/2)

Bisection width: O(n1/2)

8Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Hypercubes:

K dimension, Switches N= 2K.

Diameter: O(K).

Good bisectional width (O(2K-1)).

9Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Binary Tree:

Scaling:

• n = 2d processor nodes (where d = depth)

• 2d+1-1 switches

Degree: 3

Diameter: O(2d)

Bisection width: O(1)

10Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Fat trees:

Similar in diameter to a binary tree.

Bisection width (which equates to bottleneck) is greatly improved due to additional dimensions.

11Computer Science, University of WarwickComputer Science, University of Warwick

Interconnection TopologiesInterconnection Topologies

Summary of topologies:

Topology Degree Diameter Bisection

1D Array 2 N-1 1

1D Ring 2 N/2 2

2D Mesh 4 2N1/2 N1/2

2D Torus 4 N1/2 2N1/2

Hypercube n=log2(N) n N/2

12Computer Science, University of WarwickComputer Science, University of Warwick

SwitchingSwitching

Operational modes:Store-and-forward:• Each switch receives an entire packet before it forwards it onto the next

switch - useful in a non-dedicated environment (I.e. a LAN). • usually, there is a finite buffer size so it is possible that packets will be

dropped under heavy load. • Also impose a larger in-switch latency.• Can detect errors in the packets

Worm hole routing (Also called cut-through switching): • Packet is divided into small “flits” (flow control digits).• Switch examines the first flit (header) which contains the destination

address, sets up a circuit and forwards the flit immediately.• Subsequent flits of the message are forwarded as they arrive (near wire-

speed).• Reduces latency and buffer overhead.• Messaging occurs at a speed close to the processors being directly

connected.• Less error detection

1Computer Science, University of WarwickComputer Science, University of Warwick

2Computer Science, University of WarwickComputer Science, University of Warwick

3Computer Science, University of WarwickComputer Science, University of Warwick

1Computer Science, University of WarwickComputer Science, University of Warwick

Cluster Network ProductsCluster Network Products

Cluster interconnects include, among others:

Gigabit Ethernet

Myrinet

Quadrics

InfiniBand

1Computer Science, University of WarwickComputer Science, University of Warwick

2Computer Science, University of WarwickComputer Science, University of Warwick

Interconnects in Top500 list Interconnects in Top500 list –– 11/200911/2009

3Computer Science, University of WarwickComputer Science, University of Warwick

Interconnects in Top500 list Interconnects in Top500 list –– 11/200811/2008

4Computer Science, University of WarwickComputer Science, University of Warwick

Cluster Network TechnologiesCluster Network Technologies

Gigabit Ethernet:The technology has matured and now offers very good performance at a very low cost.

Latency performance is moderate - many Ethernet switches are designed for general LANs (store & forward) where latency reduction is not necessary the primary incentive (the latency isorder of ms).

Zero-copy OS-bypass message passing can be supported with programmable NIC and direct memory access.

5Computer Science, University of WarwickComputer Science, University of Warwick

Cluster Network TechnologiesCluster Network Technologies

Myrinet:using fibre optic cable

Uses a fat-tree structure

Low latency (7-10 µsec) with a peak bandwidth of 4G bps.

Provides zero-copy message passing and can offload packet processing to the NIC.

Uses cut-through/worm-hole switching to reduce latency.

More expensive than Ethernet

(a) Twisted pair cable in Ethernet (b) Fibre optic cable

6Computer Science, University of WarwickComputer Science, University of Warwick

Zero copy protocolZero copy protocol

7Computer Science, University of WarwickComputer Science, University of Warwick

Cluster Network TechnologiesCluster Network Technologies

Quadrics:product of a strategic partnership between Quadrics & Compaq (used in ASCI/Q).

Uses a fat quad-tree topology

Very low latency of 2-5 µsec; bandwidth is about 2Gbps

8Computer Science, University of WarwickComputer Science, University of Warwick

Cluster Network TechnologiesCluster Network Technologies

InfiniBand:by Intel.

Basic link speed of 2.5Gb/s.

Cut-through/worm-hole switches are used.

Latency is about 200 nanoseconds.

10Computer Science, University of WarwickComputer Science, University of Warwick

BlueGeneBlueGene/L/L

Source: IBM

No. 1 in Top500 list from 2005-2007

11Computer Science, University of WarwickComputer Science, University of Warwick

BlueGeneBlueGene/L /L –– networkingnetworking

BlueGene system employs various network types.

Central is the torus interconnection network:

3D torus with wrap-around.

Each node connects to six

neighbours (bidirectional).

Routing achieved in hardware.

each link with 1.4 Gbit/s.

1.4 x 6 x 2= 16.8 Gbit/saggregate bandwidth

12Computer Science, University of WarwickComputer Science, University of Warwick

BlueGeneBlueGene/L/L

Other three networks:Binary combining tree

• Used for collective/global operations - reductions, sums, products , barriers etc.

• Low latency (2μS)

Gigabit Ethernet I/O network• Support file I/O

• An I/O node is responsible for performing I/O operations for 128 processors

Diagnostic & control network• Booting nodes, monitoring processors.

Each chip has the above four network interfaces (torus, tree, i/o, diagnostics)

Note specialised networks are used for different purposes -quite different from many other HPC cluster architectures.

13Computer Science, University of WarwickComputer Science, University of Warwick

BlueGeneBlueGene/L/L

Message Passing:

The BlueGene focussed a good deal of energy developing an efficient MPI implementation to reduce latency in the software stack.

Using the MPICH code-base as a start-point:• MPI library was enhanced with respect to machine architecture.

• For example, using the combining tree for reductions & broadcasts.

Reading paper:

“Filtering Failure Logs for a BlueGene/L Prototype”

14Computer Science, University of WarwickComputer Science, University of Warwick

ASCI QASCI Q

The Q supercomputing system at Los Alamos National Laboratory (LANL)

Product of Advanced Simulation and Computing (ASCI) program

Used for simulation and computational modelling

No. 2 in 2002 in Top500 supercomputer list

15Computer Science, University of WarwickComputer Science, University of Warwick

ASCI QASCI Q

“Classical” cluster architecture.

1024 SMPs (AlphaServer ES45s from HP) are put in one segment• Each with four EV-68 1.25Ghz CPUs with 16-MB cache

the whole system has 3 segments• The three segments can operate independently or as a single system

• Aggregate 60 TeraFLOPS capability.

• 33 Terabytes of memory

664 TB of global storage

Interconnection using • Quadrics switch interconnect (QSNet)

• High bandwidth (250MB/s) and Low latency (5us) network.

Top500 list: http://www.top500.org/system/6071

16Computer Science, University of WarwickComputer Science, University of Warwick

Earth SimulatorEarth Simulator

Built by NEC, located in the Earth Simulator Centre in Japan

Used for running global climate models to evaluate the effects of global warming

No.1 from 2002-04

17Computer Science, University of WarwickComputer Science, University of Warwick

Earth SimulatorEarth Simulator

640 nodes, each with 8 vector processors and 16GB memoryTwo nodes are installed in one cabinet

In total:5120 processors (NEC SX-5)

10 TeraByte memory

700 TeraByte of disk storage and 1.6 PetaByte of Tape storage

Computing capacity: 36 TFlop/s

Networking: Crossbar interconnection (very expensive)Bandwidth: 16GB/s between any two nodesLatency: 5us

Dual level parallelism: OpenMP in-node, MPI out of node

Physical installation: Machine resides on 3th floor; Cables on 2nd; Power generation & cooling on 1st and ground floor.