observability conditions and automatic operand-isolation in high-throughput asynchronous pipelines

Post on 26-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous Pipelines. Arash Saifhashemi Peter A. Beerel University of Southern California USC Asynchronous CAD/VLSI Group (async.usc.edu) (Thanks to a grant from Intel and NSF) - PowerPoint PPT Presentation

TRANSCRIPT

Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous

PipelinesArash Saifhashemi

Peter A. BeerelUniversity of Southern California

USC Asynchronous CAD/VLSI Group (async.usc.edu)(Thanks to a grant from Intel and NSF)

Patmos 2012, Sep 2012, Newcastle upon Tyne

Asynchronous Circuit Design - Today Applications

• 3D Network on chips (STMicroelectronics)• Ethernet Switches (Intel SRD)• Ultra high-speed FPGAs (Achronix)• Process variation• Low-power chip design (Encryption – Tiempo,

…)

Basic challenges: Automation

Proteus design flow (USC)• Uses commercial synchronous CAD tools• Starting at a high-level specification written in

SVC (SystemVerilogCSP) Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G)

- 1.2 B transistors, 90% Asynchronous 13% Proteus

Tiempo TAM16 - Clockless 16-

bit microcontrolle

r

STMicroelectronics WIOMING 3D-IC (July

2012)

Achronix FPGA. 1.7 M

LUTs. 2.1 Gbps IO

ConstraintsSync Library

Clock Gating

Clock Tree SynthesisNetlist

Clock Gating

The Proteus Flow

Synthesis

Physical Design

Verilog

Netlist

Netlist

Constraints

Constraints

Final Layout

Proteus/Sync

LibraryClockFree

System- Verilog

Image Netlist

SVC2RTLDesign Goals

Synth. RTL Constraints

Async Netlist

Key Features• Re-uses synchronous EDA tools• Seamless integration into existing flows• Up to 2X higher performance

Tool Status• Started at USC Async CAD/VLSI• Commercialized by TimeLess (2008)• Acquired by Fulcrum (2010)• Intel Acquired Fulcrum (2011)• Used in Intel Ethernet Alta FM6000 chip

The Problem• Limited and manual power optimization

6

Conditional Communication in Proteus

0

1

0

Not received

Dummy value

0

1

Not sent

Example: ALU

SVC Description

No conditionality in high-level description

Reconverging fanouts

+

Unnecessary calculation

Adding Isolation Cells

• All inputs/outputs are unconditional

• Operand Isolation• And-based isolation

cells• Generated by

synchronous RTL synthesizer

• Does not prevent switching in

asynchronous circuitsIsolation cells are not effective in asynchronous

circuits

Three-valued logic

• Formal justification of conditioning• Three-valued logic image model

• Each iteration is modeled by a clock cycle• Each variable can be 0, 1, or N (no token)

Status of each channel

One iteration

3VL Unconditional Functions

Unconditional functions

• Can be represented only by , , operators

• Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, …Lemma 1: the output is N iff at least one of the inputs is N.

SEND/RECEIVE Operators

• Conditional Communication• RECEIVE and SEND are modeled as and Ⓡ Ⓢ operators

Behave like buffers when E=1

SEND Reconditioning

Assuming y=f(x) is unconditional and e TFO(y)

Lemma 2:

Application: SEND cells can be moved through logic

• Similar to retiming in synchronous circuits

Less switching when e=0

Less number of SENDs

Observability in 3V Networks

Local Observability Partial Care (LOPC)• OPC(f,C,xj) of input xj of a node representing a function f is the condition

under which f’s output is not affected as xj changes in C {0,1,N}Global Observability Partial Care (GOPC)

• GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C {0,1,N}

• Example: 𝑂𝑃𝐶 (𝑀𝑢𝑥 , {0 ,1 } , 𝑖1 )=𝑠{ 1}𝑖2{0 , 1}

i1 changes in {0,1} are not observable when…

i2 =0 or i2 =1

𝑂𝑃𝐶 ( 𝑓 ,𝐶 , 𝑥 ) implies→

𝐺𝑂𝑃𝐶 (𝐶 ,𝑥 )

s =1

GOPC Conditioning

When xj is not observable…• Add a SEND followed by a RECEIVE• Move the SENDs using SEND reconditioning

Lemma 3: 𝐼𝑓 𝑒 { 0}→𝐺𝑂𝑃𝐶 ( {0,1 } ,𝑥1 ) h𝑡 𝑒𝑛 : 𝑓 (𝒙 )= ( 𝑓 (𝒙 ) Ⓢ𝑒 ) Ⓡ𝑒

SEND Reconditioning

0

0 or 1

NNN

N

N

1

Conditioning

&

+

0

0

+

No Activity

Inserting Isolating Nodes and Recognizing Enable DomainsSynchronous synthesis tools can insert isolating nodes

• Constrained to insert isolating nodes only on non-critical pathsNode u is in e’s Enable Domain OIED(e) if

• All paths starting from a primary input and ending at u include an isolating node controlled by e

• Detected using a DFS search

Pre-layout Analysis

• Wu : power of receiving data on all inputs and sending the output (unconditional nodes)

• K: power of conditional nodes

• rf: activity factor Total power Power of each domain

Domain power after isolation (n inputs)

Benefit of isolating each domain

Post-layout Experimental Results• Case study: 32-bit ALU placed and routed

• Back annotated switching activity using a VCD file• Results:

• Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2

• 53% power reduction when only isolating MUL (rf=0.25)

• Area cost of isolating MUL is about 4% and no performance penalty

Conclusions and Future Work

Conditional communication in async. circuits is not free

• Creates area and performance overheads• Requires manual or automatic optimization

Asynchronous circuits can/should leverage sync. tools

• This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits

Our future work• Evaluate the proposed method on bigger designs• Adopt other sync power optimization techniques such as clock

gating• Optimize the location of SEND/RECEIVE nodes (Reconditioning)

top related