50 th annual allerton conference, 2012 on the capacity of bufferless networks-on-chip

25
On the Capacity of Bufferless Networks-on- Chip Alex Shpiner , Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy Faculty of Electrical Engineering, Technion, Haifa, Israel

Upload: shanon

Post on 23-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

50 th Annual Allerton Conference, 2012 On the Capacity of Bufferless Networks-on-Chip. Alex Shpiner , Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy. Faculty of Electrical Engineering , Technion , Haifa, Israel. Network-on-Chip ( NoC ). Packet-based network infrastructure. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

50th Annual Allerton Conference, 2012

On the Capacity of Bufferless Networks-on-Chip

Alex Shpiner, Erez Kantor, Pu Li, Israel Cidon and Isaac Keslassy

Faculty of Electrical Engineering,Technion, Haifa, Israel

Page 2: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Network-on-Chip (NoC)

2

Buses and dedicated wires Packet-based network infrastructure

Page 3: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Network-on-Chip (NoC)

3

Page 4: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Collision

4

Page 5: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Buffering

5

Drawbacks:• Dynamic and static energy.• Chip area.• Complexity of the design.

Page 6: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Deflecting

6

Drawbacks:• No latency guarantee.• No bandwidth guarantee.• Not the shortest path.

Page 7: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Scheduling

7

Page 8: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

The Objective

Scheduling algorithm for bufferless network that maximizes throughput and guarantees QoS.

8

Page 9: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Complete-ExchangePeriodic Traffic

9

In a period: Every node sends one unicast data packet to every other node.

Page 10: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Complete-ExchangePeriodic Traffic

Computation step: autonomous processing. Communication step: every core sends unicast data packet to every other core.

Applications: Bulk Synchronous Parallel (BSP) programing. Numerical parallel computing (FFT, matrix transpose, …). End-to-end congestion control.

10

time

Core 0

Core 1

Core 2

Core 3

computation computationcommunication

Page 11: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Contributions

Optimal scheduling algorithm for line and ring. Optimal scheduling algorithm for torus. Constant approximation and bounds for mesh.

11

Page 12: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Related Work Bufferless NoCs designs

Deflecting [Moscibroda et al. ‘09] Dropping [Gomez et al. ‘08]

TDM-based NoCs Aethereal [Goosens et al. ‘05] – provides

architecture, not scheduling. Nostrum [Millberg et al. ‘04] – uses buffers.

Direct Routing NP-hard for general traffic [Busch et al. ‘06]

12

Page 13: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Problem Definition

1. Line, ring, torus or mesh network topology.2. Complete-exchange periodic traffic pattern.3. No buffering, deflecting or dropping packets.4. Equal propagation times and capacity on links.5. Equal packet sizes.6. Shortest routing.

13

Page 14: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Problem Definition

Find a schedule that maximizes throughput Minimizes the period time.

14

Page 15: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Degree-Two NoC Scheduling (DTNS) Algorithm

Each node i, at each time slot t, for each direction:

1. If at t-1 received a packet for retransmission, then retransmit it at t.

2. Else, inject packet to the farthest destination among all packets waiting to be sent from the node.

15

1→3

1→2

2→4

2→3

1→4

3→4

Page 16: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

DTNS Period Length

n-Line: time slots.

Almost achieves capacity limit.• Impossible to spread traffic uniformly: central link is a bottleneck.

n-Ring:

Achieves capacity limit for odd n.• For even n achieves capacity with overlapping.

16

, if is even

, if is oddtime slots

time slots

Page 17: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Torus NoC Scheduling (TNS) Algorithm

Inject simultaneously in four directions.

Long-then-short routing.

Dist(x1, x2)=min{|x1-x2|, N-|x1-x2|}

17

Page 18: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Torus NoC Scheduling (TNS) Algorithm

Period consists of phases. Phase consists of epochs.

For packet from (a,b) to (c,d): Phase

i = max{Dist(a,c),Dist(b,d)}

Epoch for clockwise

• j = min{Dist(a,c),Dist(b,d)} for counter-clockwise

• j-i = min{Dist(a,c),Dist(b,d)}

18

Page 19: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

TNS Period Length

-Torus:

Achieves capacity limit for odd N.• For even n achieves capacity limit with overlapping.

19

, if is odd

, if is eventime slots

time slots

Page 20: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Mesh

Lower bound for period length:

20

, if is even

, if is oddtime slots

time slots

Page 21: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

TNS Algorithm in Mesh

21

2N

2N

N

NUpper bound for period length: =

Page 22: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

-constant approximation.

Bounds for Mesh Scheduling Period Length

22

𝒏√𝒏𝟒 ≤𝑺𝑴 (𝒏=𝑵 ∗𝑵 )≤𝒏√𝒏+𝟏𝟐 √𝒏

Page 23: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Evaluation

23

Throughput = num. of packets / period length

Page 24: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Summary

Use bufferless NoCs to reduce chip power and area consumption.

Rely on knowledge of periodic traffic for scheduling to increase capacity. Complete-exchange traffic.

Line, Ring – DTNS optimal scheduling. Torus – TNS optimal scheduling. Mesh – bounds for TNS application.

24

Page 25: 50 th  Annual Allerton Conference, 2012 On the Capacity of  Bufferless  Networks-on-Chip

Thank you.