csl718 : multiprocessors

36
Anshul Kumar, CSE IITD CSL718 : CSL718 : Multiprocessors Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006

Upload: sanam

Post on 05-Jan-2016

72 views

Category:

Documents


4 download

DESCRIPTION

CSL718 : Multiprocessors. Interconnection Mechanisms Performance Models 20 th April, 2006. M. M. M. M. M. M. M. M. P. P. P. P. P. P. P. P. Interconnection Network. Interconnection Network. M. M. M. M. M. M. Global Interconnection Network. M. M. M. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD

CSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : Multiprocessors

Interconnection Mechanisms

Performance Models

20th April, 2006

Page 2: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 2

Connecting Processors and MemoriesConnecting Processors and MemoriesConnecting Processors and MemoriesConnecting Processors and Memories

• Shared Buses

• Interconnection Networks– Static Networks

– Dynamic Networks

P P P P

M M M

Interconnection Network

M

M M M

P P P P

M M M

Interconnection Network

M

M M M

Global Interconnection Network

M M M

Page 3: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 3

Shared BusShared BusShared BusShared Buseach processor sees this picture:

processing

bus access

timentransactiobustimeprocessing

timentransactiobusnutilizatiobus

prob of a processor using the bus = prob of a processor not using the bus = 1 – prob of none of the n processors using the bus = (1 – )n

prob of at least one processor using the bus = 1 – (1 – )n

achieved BW on a relative scale = 1 – (1 – )n

required BW = n available BW = 1

Page 4: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 4

Effect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requests

A W

(1-PA )1- + PA 1-PA

PA

1 also11

111

1

raterequestactual

111

a

aA

na

AA

A

A

A

wA

AWA

A

AA

AA

aPanBW

PP

P

P

P

qqa

qqP

P

PP

Pq

prob = qA prob = qW

Page 5: CSL718 : Multiprocessors

Shared Bus : BW per proc

-0.100

0.000

0.100

0.200

0.300

0.400

0.500

0.600

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

BW required (req probability)

BW a

chie

ved

n = 2

n = 3

n = 4

n = 2

n = 3

n = 4

Page 6: CSL718 : Multiprocessors

Shared Bus : utilization

-0.200

0.000

0.200

0.400

0.600

0.800

1.000

1.200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

req probability

utili

zatio

n

n = 2

n = 3

n = 4

n = 2

n = 3

n = 4

Page 7: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 7

Waiting timeWaiting timeWaiting timeWaiting time

busa

abus

A

A

A

AAbus

iA

iAbus

Ai

Ai

busw

Ai

A

th

bus

TTP

P

P

PPTPiPT

PPTiT

PP

)(ii

T i

1

)1(1

1 )1(

)1( time waitingof valueExpected

)1( thisofy probabilit

attempt 1on accepted and times rejected isrequest if

timewaiting

21

1

Page 8: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 8

Switched NetworksSwitched NetworksSwitched NetworksSwitched Networks

BUS• Shared media• Lower Cost• Lower throughput• Scalability poor

Switched Network• Switched paths• Higher cost• Higher throughput• Scalability better

Page 9: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 9

Interconnection NetworksInterconnection NetworksInterconnection NetworksInterconnection Networks• Topology : who is connected to whom

• Direct / Indirect : where is switching done

• Static / Dynamic : when is switching done

• Circuit switching / packet switching : how are connections established

• Store & forward / worm hole routing : how is the path determined

• Centralized / distributed : how is switching controlled

• Synchronous/asynchronous : mode of operation

Page 10: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 10

PM

Direct and Indirect NetworksDirect and Indirect NetworksDirect and Indirect NetworksDirect and Indirect Networks

PMS

PMS

SMP

SMP

PM

PM

PM

SW

ITC

H

DIRECTINDIRECT

node node

node node

link

linklink link

node

node

node

node

link

link

link

link

Page 11: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 11

Static and Dynamic NetworksStatic and Dynamic NetworksStatic and Dynamic NetworksStatic and Dynamic Networks

• Static Networks– fixed point to point connections– usually direct– each node pair may not have a direct connection– routing through nodes

• Dynamic Networks– connections established as per need– usually indirect– path can be established between any pair of nodes– routing through switches

Page 12: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 12

Static Network TopologiesStatic Network TopologiesStatic Network TopologiesStatic Network Topologies

Linear

Star

2D-Mesh

Tree

Non-uniform connectivity

Page 13: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 13

Static Networks Topologies- contd.Static Networks Topologies- contd.Static Networks Topologies- contd.Static Networks Topologies- contd.

Ring

Fully ConnectedTorus

Uniform connectivity

Page 14: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 14

Illiac IV Mesh NetworkIlliac IV Mesh NetworkIlliac IV Mesh NetworkIlliac IV Mesh Network

0 1 2

3 4 5

6 7 8

01

2

3

45

6

7

8

neighbors of node r :(r 1) mod 9 and(r 3) mod 9 Chordal Ring

Page 15: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 15

Fat Tree NetworkFat Tree NetworkFat Tree NetworkFat Tree Network

Page 16: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 16

Dynamic NetworksDynamic NetworksDynamic NetworksDynamic Networks

k kcross -bar

switch

building block for multi-stagedynamic networks

2 2switch

straight exchange upperbroadcast

lowerbroadcast

simplestcross-bar

Page 17: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 17

Baseline NetworkBaseline NetworkBaseline NetworkBaseline Network

000

001

010

011

100

101

110

111

000

001

010

011

100

101

110

111

blocking can occur

Page 18: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 18

Benes NetworkBenes NetworkBenes NetworkBenes Network

non-blocking

Page 19: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 19

Switching MechanismSwitching MechanismSwitching MechanismSwitching Mechanism

• Circuit Switching (connection oriented communication)– A circuit is established between the source and

the destination

• Packet Switching (connectionless communication)– Information is divided into packets and each

packet is sent independently from node to node

Page 20: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 20

Routing in NetworksRouting in NetworksRouting in NetworksRouting in Networks

nodeincomingmessage

outgoingmessage

header payload/datastore & forward

routing

worm holerouting

time

BW

H

BW

l

BW

l

BW

Hnlatency

BW

l

BW

Hnlatency

Page 21: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 21

Routing in presence of congestionRouting in presence of congestionRouting in presence of congestionRouting in presence of congestion

• Worm hole routing– When message header is blocked, many links

get blocked with the message

• Solution: cut-through routing– When message header is blocked, tail is

allowed to move, compressing the message into a single node

Page 22: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 22

Routing OptionsRouting OptionsRouting OptionsRouting Options

• Deterministic routing: always same path followed

• Adaptive routing: best path selected to minimize congestion

• Source based routing: message specifies path to destination

• Destination based routing: message specifies only destination address

Page 23: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 23

Some Performance ParametersSome Performance ParametersSome Performance ParametersSome Performance Parameters

time

sender

receiver

time of flight

overhead

overhead

Tx time=bytes/BW

Tx time=bytes/BW

transport latency

total latency

Page 24: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 24

Other ParametersOther ParametersOther ParametersOther Parameters

• Throughput Bandwidth (no credit for header)

• Bisection bandwidth = BW across a bisection

• Node degree

• Network Diameter

• Cost

• Fault Tolerance

Page 25: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 25

Multidimensional Grid/MeshMultidimensional Grid/MeshMultidimensional Grid/MeshMultidimensional Grid/Mesh

Size

=k k …. k (n times)

= k n

Diameter

= (k-1) n without end around

connections

= k n /2 with end around

connections

k-ary n-cube

for (Binary) Hypercube : k = 2

Page 26: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 26

Grid/Mesh Performance - 1Grid/Mesh Performance - 1Grid/Mesh Performance - 1Grid/Mesh Performance - 1

cycle ain req message of prob is

dimension one along

hops of no. av. is

dimensions ofnumber is

rate arrival Message

r

k

n

knr

d

d

kd

Page 27: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 27

Grid/Mesh Performance - 2Grid/Mesh Performance - 2Grid/Mesh Performance - 2Grid/Mesh Performance - 2

np

Tkr

T

n

sd

s

2link a along

request ofy Probabilit

2Occupancy Server

2 rate Service

Page 28: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 28

Grid/Mesh Performance - 3Grid/Mesh Performance - 3Grid/Mesh Performance - 3Grid/Mesh Performance - 3

k-ary n-cube

sw

w

Tpp

T

D

T

)1(2)1(2

)(1

model queueopen 1//M use

node aat time waiting

B

Page 29: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 29

Switch PerformanceSwitch PerformanceSwitch PerformanceSwitch Performance

k mcross -bar

switch

mm

mm

m

mm

E(i)i

rrCq(i)ki

T

r

i

i

ii

ikii

k

11

)1(

portsoutput of num

portoutput specific a including patterns address offraction

requests ofout accepted requests of no. expected

)1( ports on requests ussimultaneo of prob

timeservice same requires packet)(or

mesageeach that assumed isit Here

cycle service one duringport

input an at request of probLet

Page 30: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 30

Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.

kk

k

i

iki

ik

k

i

ikii

k

k

i

ikii

kik

i

ikii

k

k

i

ikii

ki

k

i

m

rmmrr

m

mmm

rrm

mCmrrCm

rrCmm

mrrCm

rrCmm

m

iqiE

1)1(1

)1(1

)1(

)1(1

)1(

)1(1

1

)()( scale) relative(on BW Expected

00

00

0

0

Page 31: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 31

Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.

waiting.of because delays compute also and submission-re

todue raterequest revised compute toneed We

conflicts. todue submission-rerequest ofeffect consider now We

requests of acceptance of prob

)1 that (assuming as wellas than less is this

1conflicts)port output of (becauseBW Expected

conflicts)port output no were there(ifBW Expected

bandwidth Requested

kr

BWP

rr km

m

rmm

m

r k

A

k

Page 32: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 32

Effect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requests

link ofBW

1 timewaiting

'

'1

1'

) and states graph with Markov (using

' raterequestactual

lHtimecycleT

TP

P

kr

BWP

m

rmmBW

rPr

rr

qq

qqrr

A

A

A

k

A

wA

wA

Page 33: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 33

Effect of bufferingEffect of bufferingEffect of bufferingEffect of buffering

There are two possibilities

• Buffering before switching (k buffers, one at each input port)

• Buffering after switching (m buffers, one at each output port)

Page 34: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 34

Switch with input buffersSwitch with input buffersSwitch with input buffersSwitch with input buffers

Rate of messages at input and output of each queue

is same in steady state - r per cycle

Service time includes delays due to conflicts,

calculated as earlier. This has an

exponential distribution – recall the analysis for a

shared bus.

M/M/1 open queue model can be used to calculate

queuing delay. Details are omitted.

TP

P

A

A1

Page 35: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 35

Switch with output buffersSwitch with output buffersSwitch with output buffersSwitch with output buffers

Here we assume that all the messages destined for same

output are queued in the same buffer, in some order. That

is no rejections and no re-submissions.

For each queue,

Messages arriving per service cycle = =

Prob of a request coming from one of

the k sources = p =

Apply MB/D/1 model for finding queuing delay Tw

m

kr

m

r

Tp

Tw )1(2

Page 36: CSL718 : Multiprocessors

Anshul Kumar, CSE IITD slide 36

ReferencesReferencesReferencesReferences

• D. Sima, T. Fountain, P. Kacsuk, "Advanced Computer Architectures : A Design Space Approach", Addison Wesley, 1997.

• K. Hwang, "Advanced Computer Architecture : Parallelism, Scalability, Programmability", McGraw Hill, 1993.