rotary router : an efficient architecture for cmp interconnection networks pablo abad, valentín...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Rotary Router : An Efficient Architecture for CMP Interconnection
NetworksPablo Abad, Valentín Puente,
Pablo Prieto, and Jose Angel GregorioUniversity of Cantabria, Spain
ISCA’07
Presented ByTina Miriam John
Outline
Introduction The Rotary Router Avoiding Anomalies Performance Evaluation Implementation Practicality Conclusion
Introduction
CMP – Most effective way to deal with increasing design complexity.
Lower latency, higher bandwidth, low power consumption and area requirements.
Existing low cost router architectures cause Head of Line (HOL) blocking.
Centralized internal storage not feasible in CMP framework.
Real traffic patterns deviate from balanced usage of network resources while employing deterministic algorithms.
Smaller packet size as in CMP networks, reduces bandwidth increase effectiveness.
General Router Structure Rotary Router sketch
Minimizes effects of small packets and takes advantage of them.
No appreciable HOL blocking.
Uses topology dependent adaptive routing.
General Router Structure
Two independent rings : packets circulate either clockwise or anti-clockwise.
Each ring built with a group of Dual-port FIFO Buffers (DFB).
Packets circulate using DFBs of the ring, until they reach a profitable output port.
No centralized arbitration employed; instead done independently at each router output port , independent of number of input ports.
Router Building Blocks Input Stage
Made of FIFO buffer and demux. Computes profitable output ports for each entering packet Selects ring direction for packet movement – to minimize
delay. Delay depends on # of DFBs traversed and time spent at
each DFB.
Output Stage Responsible for getting packets out of the rings and sending
them to a neighbor router. Made of two buffers and a mux. Applies Flow Control mechanism between contiguous routers.
Router Building Blocks
Buffering Segment Stage Made up of two DFBs connecting every two router ports. Each DFB has two pairs of R/W ports. One pair builds a ring in which the packets turn. The other pair connects the buffer to Input and Output
stages. Decodes routing information generated by Input stage,
placed in packet header.
Flow Control and Routing Algorithm Virtual Cut Through – Controls advance of packets
among routers. Bubble flow control – Regulates packet injection into
rings Occupation based flow control – Manages advance of
packets in rings inside router.
Avoiding Anomalies Deadlock and Livelock
Bubble flow control prevents input ports from exhausting buffering space in the internal rings of the router.
Packets always move between routers because of guaranteed hole in any ring.
Delays appearance of congested situations and removes HOL blocking effect.
Starvation Injection traffic needs three holes to enter a router; in-
transit traffic requires only two. In-transit traffic starvation reduced by balancing buffer
occupation among input ports. Done by modifying flow control, increasing the required
number of holes to inject a packet into the ring.
Performance Evaluation
Synthetic Workloads
(a) (b)
Maximum Normalized Throughput (a) 4x4 torus (b) 8x8 torus
Performance Evaluation Synthetic Workloads
(a) Random Traffic (b) Transpose Matrix Traffic
Performance Evaluation
Real Workloads
(a) Normalized Execution Time (b) Main Simulation Parameters
Implementation Practicality
Delay and Area
(a) Structure of DFB (b) Atomic modules of DFB
Implementation Viability
Power
(a) Power consumption for 8x8 (b) Mobility of packets
torus network
Conclusion
A novel router architecture targeting CMP systems.
Utilizes a decentralized and scalable structure based on rings.
Eliminates HOL blocking, improves performance and provides a deadlock avoidance mechanism.
Reasonable costs in terms of area and power consumption.
References
W. Dally, B. Towles, “Principles and Practices of Interconnection Networks”. Morgan Kaufmann, 2004..
P. Kermani, L. Kleinrock, “Virtual Cut-Through: A New Computer Communication Switching Technique”. Computer Networks, Vol. 3, pp. 267-286, September 1979.
V. Puente, J.A. Gregorio, J. M. Prellezo, R.Beivide, J. Duato,C. Izu, “Adaptive Bubble Router: a Design to Improve Performance in Torus Networks”, International Conference of Parallel Processing (ICPP) 1999.
Y. Tamir and G.L. Frazier. “Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches” IEEE Trans. on Computers, Vol.41, No. 6, pp 725-737, June 1992.
Thanks!!!