rotary router : an efficient architecture for cmp interconnection networks pablo abad, valentín...

Post on 21-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Rotary Router : An Efficient Architecture for CMP Interconnection

NetworksPablo Abad, Valentín Puente,

Pablo Prieto, and Jose Angel GregorioUniversity of Cantabria, Spain

ISCA’07

Presented ByTina Miriam John

Outline

Introduction The Rotary Router Avoiding Anomalies Performance Evaluation Implementation Practicality Conclusion

Introduction

CMP – Most effective way to deal with increasing design complexity.

Lower latency, higher bandwidth, low power consumption and area requirements.

Existing low cost router architectures cause Head of Line (HOL) blocking.

Centralized internal storage not feasible in CMP framework.

Real traffic patterns deviate from balanced usage of network resources while employing deterministic algorithms.

Smaller packet size as in CMP networks, reduces bandwidth increase effectiveness.

General Router Structure Rotary Router sketch

Minimizes effects of small packets and takes advantage of them.

No appreciable HOL blocking.

Uses topology dependent adaptive routing.

General Router Structure

Two independent rings : packets circulate either clockwise or anti-clockwise.

Each ring built with a group of Dual-port FIFO Buffers (DFB).

Packets circulate using DFBs of the ring, until they reach a profitable output port.

No centralized arbitration employed; instead done independently at each router output port , independent of number of input ports.

Router Building Blocks Input Stage

Made of FIFO buffer and demux. Computes profitable output ports for each entering packet Selects ring direction for packet movement – to minimize

delay. Delay depends on # of DFBs traversed and time spent at

each DFB.

Output Stage Responsible for getting packets out of the rings and sending

them to a neighbor router. Made of two buffers and a mux. Applies Flow Control mechanism between contiguous routers.

Router Building Blocks

Buffering Segment Stage Made up of two DFBs connecting every two router ports. Each DFB has two pairs of R/W ports. One pair builds a ring in which the packets turn. The other pair connects the buffer to Input and Output

stages. Decodes routing information generated by Input stage,

placed in packet header.

Flow Control and Routing Algorithm Virtual Cut Through – Controls advance of packets

among routers. Bubble flow control – Regulates packet injection into

rings Occupation based flow control – Manages advance of

packets in rings inside router.

Avoiding Anomalies Deadlock and Livelock

Bubble flow control prevents input ports from exhausting buffering space in the internal rings of the router.

Packets always move between routers because of guaranteed hole in any ring.

Delays appearance of congested situations and removes HOL blocking effect.

Starvation Injection traffic needs three holes to enter a router; in-

transit traffic requires only two. In-transit traffic starvation reduced by balancing buffer

occupation among input ports. Done by modifying flow control, increasing the required

number of holes to inject a packet into the ring.

Performance Evaluation

Synthetic Workloads

(a) (b)

Maximum Normalized Throughput (a) 4x4 torus (b) 8x8 torus

Performance Evaluation Synthetic Workloads

(a) Random Traffic (b) Transpose Matrix Traffic

Performance Evaluation

Real Workloads

(a) Normalized Execution Time (b) Main Simulation Parameters

Implementation Practicality

Delay and Area

(a) Structure of DFB (b) Atomic modules of DFB

Implementation Viability

Power

(a) Power consumption for 8x8 (b) Mobility of packets

torus network

Conclusion

A novel router architecture targeting CMP systems.

Utilizes a decentralized and scalable structure based on rings.

Eliminates HOL blocking, improves performance and provides a deadlock avoidance mechanism.

Reasonable costs in terms of area and power consumption.

References

W. Dally, B. Towles, “Principles and Practices of Interconnection Networks”. Morgan Kaufmann, 2004..

P. Kermani, L. Kleinrock, “Virtual Cut-Through: A New Computer Communication Switching Technique”. Computer Networks, Vol. 3, pp. 267-286, September 1979.

V. Puente, J.A. Gregorio, J. M. Prellezo, R.Beivide, J. Duato,C. Izu, “Adaptive Bubble Router: a Design to Improve Performance in Torus Networks”, International Conference of Parallel Processing (ICPP) 1999.

Y. Tamir and G.L. Frazier. “Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches” IEEE Trans. on Computers, Vol.41, No. 6, pp 725-737, June 1992.

Thanks!!!

top related