presenter : ching-hua huang 2012/4/16 a low-latency gals interface implementation yuan-teng chang;...

Download Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu

If you can't read please download the document

Upload: brittany-garrett

Post on 15-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1

Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu Chen; Fu-Chiung Cheng Dept. of Comput. Sci., Nat. Chiao Tung Univ., Hsinchu, Taiwan Circuits and Systems (APCCAS), 2010 IEEE Asia Pacific Conference on National Sun Yat-sen University Embedded System Laboratory Slide 2 2 With the VLSI technology improving rapidly, SoC has been becoming the most important VLSI application. However, clock distribution and low power have already become the two most important issues in SoC design. In addition, its also a very important issue to integrate IPs that can perform operations correctly with different clocks. Asynchronous circuits may resolve these problems by removing the clock signal. But its too hard to implement the whole circuits with asynchronous circuit. The GALS (Globally-Asynchronous Locally-Synchronous) design methodology can balance this problem via separating each synchronous design with asynchronous interface. Thus, each part of the circuit can perform operations with its own clock. The communication between different parts of the circuit can be achieved via asynchronous channels. The GALS provides a reliable communication between different modules. However, the latency of GALS interface may cause performance degradation seriously. Thus how to reduce the latency of GALS interface is significant. In this paper, we implemented a small and simple stretchable-clock based GALS wrapper with low latency in Verilog HDL and synthesized the design with TSMC 0.13m cell library. We also showed that the wrapper can operate correctly with modules which operate with great different clock frequencies. In addition, we also recommend adding FIFO storage element on the transmission path. Slide 3 3 Whats the problem IPs can perform operations correctly with different clocks. Synchronous circuits work by clock signal Some drawbacks Asynchronous circuits work by handshake protocols high implementation costs and difficulties GALS (Globally-Asynchronous Locally -Synchronous) design methodology To integrate both the advantages of Synchronous and Asynchronous Circuits The latency of GALS cause performance degradation seriously. A stretchable-clock based GALS wrapper with low latency. Slide 4 4 Related work [This paper] [1,2,3,4] Some drawbacks of Synchronous circuit [7] GALS systems [8] GALS has large latency How to deal with these drawbacks GALS was first Appeared in 1984 [6] To integrate both the advantages of Syn. and Asyn Circuits 1. clock skew 2. difficulty in clock distribution 3. worse case performance 4. not modular 5. sensitive to variations in physical parameters 6. synchronization failure 7. noise (EMI) reducing the latency of asynchronous interface [9] 1.Pausible clock generator 2.Stretchable clock generator The major difference between them is the way to stop the clock [5] Asynchronous circuit handshake protocols high implementation costs and difficulties 1.Input controller 2.Output controller GALS methodology was proposed Slide 5 5 Proposed method The new STG (Signal Transition Graph) Compose with REQ ACK stretch WR(or RD) The proposed new wrapper Input controller Output controller Slide 6 1.Stoppable clock generator 2.The most commonly used approach so far 3.Uses odd number of inverters to generate the local clock signal of the locally synchronous module 6 AB 00 0 1 1 0 11 1 Y 0 1 1 Ri Ai lclk rclk Slide 7 7 AB 00 0 1 1 0 11 0 Y 1 Hold 1.The basic idea is similar to the above approach : stop the clock when data transfer occurs 2.The major difference with above approach is the way to stop the clock The symbol "C represents C-element, a self-timed latch AB 00 0 1 1 0 11 1 Y 0 0 0 Slide 8 8 =0 =1=1 =1=1 =1=1 =1=1 If receiver needs to receive data Slide 9 9 =0 =1=1 =1=1 =1=1 =1=1 Slide 10 10 If it put a First-In-First-Out (FIFO), the sender could put the data into the FIFOs and get acknowledge earlier. Thus sender will continue computation instead of waiting for receiver. The latch is controlled by ACK; data has to be stored correctly in the latch during the time from ACK+ to ACK- Slide 11 Implemented proposed design Gate-level in Verilog HDL Synopsys Design Complier Be used to synthesize our gate-level design With TSMC 0.13m cell library 11 Compare area and latency with two different GALS models proposed[11] Slide 12 12 Experimental Results clk sender = 555 MHz, clk receiver = 133 MHz clk sender = 133 MHz, clk receiver = 555 MHz Slide 13 13 This paper propose a new GALS wrapper Based on four-phase handshake protocol. Consists of an input controller and an output controller The Area and Latency are improved. Compared to the C-element based design The area of the new wrapper is only 30.8% The latency of the new wrapper is only 39.7% Compared to the standard cell based design The area of the new wrapper is only 63.5% The latency of the new wrapper is only 55% Slide 14 14 This paper list the GALS history and principle for design Like the GALS concept Synchronous Asynchronous GALS To ensure operation correctness, the synchronous modules must be stopped when the data transfer occurs Improving my recognize for GALS The control of Asynchronous wrapper STG