kilo-noc: a network-on-chip architecture for scalability and service guarantees

57
Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees Boris Grot The University of Texas at Austin

Upload: annis

Post on 25-Feb-2016

17 views

Category:

Documents


0 download

DESCRIPTION

Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees. Boris Grot The University of Texas at Austin. Technology Trends. Xeon Nehalem-EX. Core i7. Pentium D. Pentium 4. Transistor count. 486. Pentium. 386. 286. 8086. 4004. Year of introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service

Guarantees

Boris GrotThe University of Texas at Austin

Page 2: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

1970 1975 1980 1985 1990 1995 2000 2005 20101,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

Tran

sisto

r cou

nt

Technology Trends

Core i7Pentium D

Pentium 4

Pentium

Xeon Nehalem-EX

4004

286386

486

8086

Year of introduction

Tran

sisto

r cou

nt

2

𝑃𝑒𝑟𝑓 / $

𝑃𝑒𝑟𝑓 /$𝑊𝑎𝑡𝑡

Page 3: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

3

Technology Applications

Page 4: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

4

Networks-on-Chip (NOCs)The backbone of highly integrated chips

Transport of memory, operand, and control trafficStructured, packet-based, multi-hop networks Increasing importance with greater levels of integration

Major impact on chip performance, energy, and area

TRIPS: 28% performance losson SPEC 2K in NOC

Intel Polaris: 28% of chip power consumption in NOC

Moving data is more expensive [energy-wise] than operating on it - William Dally, SC ‘10

Page 5: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

5

On-chip vs Off-chip Interconnects Topology Routing Flow

control

Pins Bandwidt

h Power Area

Page 6: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Future NOC Requirements100’s to 1000’s of network clients

Cores, caches, accelerators, I/O ports, …Efficient topologies

High performance, small footprintIntelligent routing

Performance through better load balanceLight-weight flow control

High performance, low buffer requirementsService Guarantees

cloud computing, real-time apps demand QOS support

6

HPCA ‘09

HPCA ‘08

MICRO ‘09

under submission

under submission

Page 7: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

7

Outline Introduction Service Guarantees in Networks-on-Chip

Motivation Desiderata, prior work Preemptive Virtual Clock Evaluation highlights

Efficient Topologies for On-chip Interconnects Kilo-NOC: A Network for 1000+ Nodes Summary and Future Work

Page 8: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

8

Why On-chip Quality-of-Service? Shared on-chip resources

Memory controllers, accelerators, network-on-chip … require QOS support

fairness, service differentiation, performance isolation

End-point QOS solutions are insufficient Data has to traverse the on-chip network Need QOS support at the interconnect level

Hard guarantees in NOCs

Page 9: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

9

NOC QOS Desiderata Fairness

Isolation of flows

Bandwidth efficiency

Low overhead: delay area energy

Page 10: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

10

Conventional QOS Disciplines Fixed schedule

Pros: algorithmic and implementation simplicity Cons: inefficient BW utilization; per-flow queuing Example: Round Robin

Rate-based Pros: fine-grained scheduling; BW efficient Cons: complex scheduling; per-flow queuing Example: Weighted Fair Queuing (WFQ) [SIGCOMM ‘89]

Frame-based Pros: good throughput at modest complexity Cons: throughput-complexity trade-off; per-flow queuing Example: Rotating Combined Queuing (RCQ) [ISCA ’96]

Per-flow queuingo Area overheado Energy overheado Delay overhead o Scheduling complexity

Page 11: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

11

Preemptive Virtual Clock (PVC) [HPCA ‘09]

Goal: high-performance, cost-effective mechanism for fairness and service differentiation in NOCs.

Full QOS support Fairness, prioritization, performance isolation

Modest area and energy overhead Minimal buffering in routers & source nodes

High Performance Low latency, good BW efficiency

Page 12: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

12

PVC: Scheduling Combines rate-based and frame-based

features Rate-based: evolved from Virtual Clock

[SIGCOMM ’90] Routers track each flow’s bandwidth consumption Cheap priority computation

f (provisioned rate, consumed BW) Problem: history effect

Flow X

Page 13: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

13

PVC: Scheduling Combines rate-based and frame-based

features Rate-based: evolved from Virtual Clock

[SIGCOMM ’90] Routers track each flow’s bandwidth consumption Cheap priority computation

f (provisioned rate, consumed BW) Problem: history effect

Framing: PVC’s solution to history effect Frame rollover clears all BW counters Fixed frame duration

Page 14: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

14

PVC: Scheduling Combines rate-based and frame-based

features Rate-based: evolved from Virtual Clock

[SIGCOMM ’90] Routers track each flow’s bandwidth consumption Cheap priority computation

f (provisioned rate, consumed BW) Problem: history effect

Flow X

Frame roller - BW counters reset - Priorities reset

Page 15: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

15

PVC: Freedom from Priority Inversion PVC: simple routers w/o per-flow buffering and

no BW reservation Problem: high priority packets may be blocked by

lower priority packets (priority inversion)

x

Page 16: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

16

PVC: Freedom from Priority Inversion PVC: simple routers w/o per-flow buffering and

no BW reservation Problem: high priority packets may be blocked by

lower priority packets (priority inversion) Solution: preemption of lower priority packets

`

Page 17: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

17

PVC: Preemption Recovery Retransmission of dropped packets Buffer outstanding packets at the source node ACK/NACK protocol via a dedicated network

All packets acknowledged Narrow, low-complexity network Lower overhead than timeout-based recovery 64 node network: 30-flit backup buffer per node

suffices

Page 18: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

18

PVC: Preemption Throttling Relaxed definition of priority inversion

Reduces preemption frequency Small fairness penalty

Per-flow bandwidth reservation Flits within the reserved quota are non-preemptible Reserved quota is a function of rate and frame size

Coarsened priority classes Mask out lower-order bits of each flow’s BW counter Induces coarser priority classes Enables a fairness/throughput trade-off

Page 19: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

19

PVC: Guarantees Minimum Bandwidth

Based on reserved quota Fairness

Subject to BW counter resolution Worst-case Latency

Packet enters source buffer in frame N, guaranteed delivery by the end of frame N+1

Page 20: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

20

Performance IsolationPARSECStream

Page 21: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

21

Performance Isolation Baseline NOC

No QOS support Globally Synchronized Frames (GSF)

J. Lee, et al. ISCA 2008 Frame-based scheme adapted for on-chip

implementation Source nodes enforce bandwidth quotas via self-

throttling Multiple frames in-flight for performance Network prioritizes packets based on frame number

Preemptive Virtual Clock (PVC) Highest fairness setting (unmasked bandwidth

counters)

Page 22: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

22

Performance Isolation

1

2

3

4

5

6

7

8

PARS

EC n

etw

ork

slow

dow

n No QOS

GSF

PVC

40 50 46 46 34 55

Page 23: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

23

PVC Summary Full QOS support

Fairness & service differentiation Strong performance isolation

High performance Inelaborate routers low latency Good bandwidth efficiency

Modest area and energy overhead 3.4 KB of storage per node (1.8x no-QOS router) 12-20% extra energy per packet

Page 24: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

24

PVC Summary Full QOS support

Fairness & service differentiation Strong performance isolation

High performance Inelaborate routers low latency Good bandwidth efficiency

Modest area and energy overhead 3.4 KB of storage per node (1.8x no-QOS router) 12-20% extra energy per packet

Will it scale to 1000 nodes?

Page 25: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

25

Outline Introduction Service Guarantees in Networks-on-Chip Efficient Topologies for On-chip

Interconnects Mesh-based networks Toward low-diameter topologies Multidrop Express Channels

Kilo-NOC: A Network for 1000+ Nodes Summary and Future Work

Page 26: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

26

NOC Topologies Topology is the principal determinant of

network performance, cost, and energy efficiency

Topology desiderata Rich connectivity reduces router traversals High bandwidth reduces latency and contention Low router complexity reduces area and delay

On-chip constraints 2D substrates limit implementable topologies Logic area/energy constrains use of wire resources Power constrains restrict routing choices

Page 27: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

2-D Mesh

27

Pros Low design & layout

complexity Simple, fast routers

P

$

Page 28: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Pros Low design & layout

complexity Simple, fast routers

Cons Large diameter Energy & latency impact

2-D Mesh

28

Page 29: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Pros Multiple terminals at each

node Fast nearest-neighbor

communication via the crossbar

Hop count reduction proportional to concentration degree

Cons Benefits limited by crossbar

complexity

29

Concentrated Mesh (Balfour & Dally, ICS ‘06)

Page 30: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Objectives: Improve connectivity Exploit the wire budget

30

Flattened Butterfly (Kim et al., Micro ‘07)

Page 31: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Point-to-point links Nodes fully connected

in each dimension

31

Flattened Butterfly

Page 32: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Pros Excellent connectivity Low diameter: 2 hops

Cons High channel count:

k2/2 per row/column Low channel utilization Control complexity

32

Flattened Butterfly

Page 33: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

Objectives: Connectivity More scalable channel count Better channel utilization

33

[Grot et al., Micro ‘09]

Multidrop Express Channels (MECS)

Page 34: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

34

Multidrop Express Channels (MECS)

Point-to-multipoint channels Single source Multiple destinations

Drop points: Propagate further -OR- Exit into a router

Page 35: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

35

Multidrop Express Channels (MECS)

Page 36: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

36

Pros One-to-many topology Low diameter: 2 hops k channels row/column I/O asymmetry

Cons I/O asymmetry Control complexity

Multidrop Express Channels (MECS)

Page 37: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

MECS Summary

MECS: a novel one-to-many topologyExcellent connectivityEffective wire utilizationGood fit for planar substrates

Results summaryMECS: lowest latency, high energy efficiencyMesh-based topologies: best throughputFlattened butterfly: smallest router area

37

Page 38: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

38

Outline Introduction Service Guarantees in Networks-on-Chip Efficient Topologies for On-chip Interconnects Kilo-NOC: A Networks for 1000+ Nodes

Requirements and obstacles Topology-centric Kilo-NOC architecture Evaluation highlights

Summary and Future Work

Page 39: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

39

Scaling to a kilo-node NOC Goal: a NOC architecture that scales to 1000+

clients with good efficiency and strong guarantees

MECS scalability obstacles Buffer requirements: more ports, deeper buffers

area, energy, latency overheads

PVC scalability obstacles Flow state, other storage area, energy

overheads Preemption overheads energy, latency

overheads Prioritization and arbitration latency overheads

Page 40: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

40

Scaling to a kilo-node NOC Goal: a NOC architecture that scales to 1000+

clients with good efficiency and strong guarantees

MECS scalability obstacles Buffer requirements: more ports, deeper buffers

area, energy, latency overheads

PVC scalability obstacles Flow state, other storage area, energy

overheads Preemption overheads energy, latency

overheads Prioritization and arbitration latency overheads

Kilo-NOC: Addresses topology and QOS scalability bottlenecks

This talk: reducing QOS overheads

Page 41: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

41

NOC QOS: Conventional ApproachMultiple virtual machines (VMs) sharing a die

Shared resources (e.g., memory controllers)VM-private resources (cores, caches)

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

VM #1

VM #1

VM #3

VM #2

Page 42: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

42

NOC QOS: Conventional ApproachNOC contention scenarios: Shared resource

accesses memory access

Intra-VM traffic shared cache access

Inter-VM traffic VM page sharing

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

VM #1

VM #1

VM #3

VM #2

Page 43: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

43

NOC QOS: Conventional ApproachNOC contention scenarios: Shared resource

accesses memory access

Intra-VM traffic shared cache access

Inter-VM traffic VM page sharing

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

VM #1

VM #1

VM #3

VM #2

Page 44: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

44

NOC QOS: Conventional ApproachNOC contention scenarios: Shared resource

accesses memory access

Intra-VM traffic shared cache access

Inter-VM traffic VM page sharing

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

VM #1

VM #1

VM #3

VM #2

Page 45: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

45

NOC QOS: Conventional ApproachNOC contention scenarios: Shared resource

accesses memory access

Intra-VM traffic shared cache access

Inter-VM traffic VM page sharing

Q Q Q Q

Q Q Q Q

Q Q Q Q

Q Q Q Q

VM #1

VM #1

VM #3

VM #2

Network-wide guarantees without network-wide QOS

support

Page 46: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

46

Kilo-NOC QOS: Topology-centric Approach

Dedicated, QOS-enabled regions Rest of die: QOS-free

A richly-connected topology (MECS) Traffic isolation

Special routing rules Ensure interference

freedom

Q

Q

Q

Q

VM #1 VM #2

VM #1

VM #3

QOS-free

Page 47: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

47

Kilo-NOC QOS: Topology-centric Approach

Dedicated, QOS-enabled regions Rest of die: QOS-free

A richly-connected topology (MECS) Traffic isolation

Special routing rules Ensure interference

freedom

Q

Q

Q

Q

VM #1 VM #2

VM #1

VM #3

Page 48: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

48

Kilo-NOC QOS: Topology-centric Approach

Dedicated, QOS-enabled regions Rest of die: QOS-free

A richly-connected topology (MECS) Traffic isolation

Special routing rules Ensure interference

freedom

Q

Q

Q

Q

VM #1 VM #2

VM #1

VM #3

Page 49: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

49

Kilo-NOC QOS: Topology-centric Approach

Dedicated, QOS-enabled regions Rest of die: QOS-free

A richly-connected topology (MECS) Traffic isolation

Special routing rules Ensure interference

freedom

Q

Q

Q

Q

VM #1 VM #2

VM #1

VM #3

Page 50: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

50

Kilo-NOC QOS: Topology-centric Approach

Dedicated, QOS-enabled regions Rest of die: QOS-free

A richly-connected topology (MECS) Traffic isolation

Special routing rules Ensure interference

freedom

Q

Q

Q

Q

VM #1 VM #2

VM #1

VM #3

Page 51: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

51

Performance Isolation

S S S

S S S

S S S

S S S

MC

MC

MC

MC

Stream

PVC-enabledMECS topology

Page 52: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

52

Performance Isolation

M2

M4

MaS S S

S S S

S S S

S S SMC

MC

MC

MC

MECS topology

With & without network-wide PVC QOS

Page 53: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

53

Performance Isolation

1.0

1.2

1.4

1.6

1.8

2.0

PARS

EC n

etw

ork

slow

dow

n MECS

MECS + PVC

K-MECS

Page 54: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

54

Summary: Scaling NOCs to 1000+ nodes Objectives: good performance, high energy-

and area-efficiency, service guarantees

MECS topology Point-to-multipoint interconnect fabric Rich connectivity: improves performance and

efficiency

PVC QOS scheme Preemptive architecture: reduces buffer

requirements Strong guarantees, performance isolation

Page 55: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

55

Summary: Scaling NOCs to 1000+ nodes Topology-aware QOS architecture

Limits the extent of QOS support to a fraction of the die

Reduces network cost, improves performance Enables efficiency-boosting optimizations in QOS-

free regions of the chip Kilo-NOC compared to MECS+PVC:

NOC area reduction of 47% NOC energy reduction of 26-53%

Page 56: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

56

AcknowledgementFaculty

Steve Keckler (advisor)Doug BurgerOnur MutluEmmett Witchel

CollaboratorsPaul Gratz Joel Hestness

Special ThanksThe awesome CART group

Page 57: Kilo-NOC: A Network-on-Chip Architecture for Scalability and Service Guarantees

57