introduction to asynchronous circuit design: specification and synthesis

Post on 07-Jan-2016

41 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introduction to asynchronous circuit design: specification and synthesis. Part III: Advanced topics on synthesis of control circuits from STGs. Outline. Logic decomposition Hazard-free decomposition Signal insertion Technology mapping Optimization based on timing information - PowerPoint PPT Presentation

TRANSCRIPT

Introduction toasynchronous circuit design:

specification and synthesis

Part III:

Advanced topics on synthesis of control circuits from STGs

Outline

• Logic decomposition– Hazard-free decomposition– Signal insertion– Technology mapping

• Optimization based on timing information– Relative timing– Timing assumptions and constraints– Automatic generation of timing assumptions

Specification(STG)

State Graph

SG withCSC

Next-state functions

Decomposed functions

Gate netlist

Reachability analysis

State encoding

Boolean minimization

Logic decomposition

Technology mapping

DesignDesignflowflow

No Hazards

abc

x 0

abcx1000

1100

b+

0100

a-

0110

c+

1

1

0

0

1

1

0

1

0

1

0

0

Decomposition May Lead to Hazards

abcx1000

1100

b+

0100

a-

0110

c+

a

bz

cx

1

0

0

0

0

1000

11001100

0100

0110

1

1

0

0

0

1

1

1

0

0

0

1

1

0

0

0

1

1

1

1

0

1

0

1

0

Decomposition

• Acknowledgement

• Global acknowledgement

• Generating candidates

• Hazard-free signal insertion

– Event insertion

– Signal insertion

Global acknowledgement

abc

z

abd

y

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

abc

z

abd

y

How about 2-input gates ?

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

a

bc

z

abd

y

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

How about 2-input gates ?

a

bc

z

abd

y

00

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

How about 2-input gates ?

abc

z

a

bd

y

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

How about 2-input gates ?

cz

dy

a

b

d- b+ d+ y+ a- y- c+ d-

c- d+ z- b- z+ c+ a+ c-

How about 2-input gates ?

Strategy for logic decomposition

• Each decomposition defines a new internal signal

• Method: Insert new internal signals such that– After resynthesis, some large gates are decomposed– The new specification is hazard-free

• Generate candidates for decomposition using standard logic factorization techniques:

– Algebraic factorization– Boolean factorization (boolean relations)

y-

z- w-

y+ x+

z+

x-

w+

1001 1011

1000

1010

0001

0000 0101

0010 0100

0110 0111

0011

y-

y+

x-

x+w+

w-

z+

z-

w-

w-

z-

z-y+

y+

x+

x+

Decomposition example

yz=1yz=0

1001 1011

1000

1010

0001

0000 0101

0010 0100

0110 0111

0011

y-

y+

x-

x+w+

w-

z+

z-

w-

w-

z-

z-y+

y+

x+

x+

1001 1011

1000

1010

0001

0000 0101

0010 0100

0110 0111

0011

y-

y+

x-

x+w+

w-

z+

z-

w-

w-

z-

z-y+

y+

x+

x+

C

C

x

y

x

y

w

z

xyz

y

zw

z

w

z

y

s-

s+

s-

s-

s=1

s=0

1001 1011

1000

1010

0111

0011y+

x-

w+

z+

z-

0001

0000 0101

0010 0100

0110

x+

w-

w-

w-

z-

z-y+

y+

x+

x+

1001

1000

1010

y+

z-

0111

C

C

x

y

x

y

w

z

x

y

z

w

z

w

z

y

sy-

y-

z- w-

y+ x+

z+

x-

w+

s-

s+

s-

s+

s-

s-

s=1

s=0

1001 1011

1000

1010

0111

0011y+

x-

w+

z+

z-

0001

0000 0101

0010 0100

0110

x+

w-

w-

w-

z-

z-y+

y+

x+

x+

1001

1000

1010

y+

z-

0111

y-

C

C

x

y

x

y

w

z

xyz

y

zw

z

w

z

y

yz=1yz=0

1001 1011

1000

1010

0001

0000 0101

0010 0100

0110 0111

0011

y-

y+

x-

x+w+

w-

z+

z-

w-

w-

z-

z-y+

y+

x+

x+

1011

1000

1010

0001

0000 0101

0010 0100

0110 0111

0011

y-

y+

x-

x+w+

w-

z+

z-

w-

w-

z-

z-y+

y+

x+

x+

1001

s-

s+

s=1

s=0

1001 1011

0111

0011

x-

w+

z+

0001

0000 0101

0010 0100

0110

x+

w-

w-

w-

z-

z-y+

y+

x+

x+

1001

1000

1010

y+

z-

0111

y-y-

z- w-

y+ x+

z+

x-

w+

s-

s+

z- is delayed by the new transition s- !

C

C

x

y

x

y

w

z

x

y

z

w

z

w

z

yyyyyyy

s-

s+

s=1

s=0

1001 1011

0111

0011

x-

w+

z+

0001

0000 0101

0010 0100

0110

x+

w-

w-

w-

z-

z-y+

y+

x+

x+

1001

1000

1010

y+

z-

0111

y-

FC

Sr

D

Decomposition(Algebraic, Boolean relations)

Hazard-free ?(Event insertion)

NO YES

C

C

C

C

SrSr

D

D

FC

Sr

D

Hazard-free ?(Event insertion)

NO YES

CC

Sr

D

until no more progress

Decomposition(Algebraic, Boolean relations)

Signal insertion for function F

State Graph

F=0 F=1

Insertion by input borders

F-

F+

Event insertion

a b

ER(x)

c

Event insertion

a b

ER(x)

cx x x x

b

SR(x)

a

Properties to preserve

a

a

b

b

a

a

b

b

a

a

b

b

xx

a

a

b

b

a

a

b

b

ba

a

b

b

xx

xx

a ispersistent

a is disabled by b

= hazards

Boolean decomposition

Fx1

xn

f H Gx1

xn

h1

hm

f

f = F (x1,…,xn) f = G(H(x1,…,xn))

Our problem: Given F and G, find H

Ch1

h2

f

state f next(f) (h1,h2)

s1 0 0 (0,-) (-,0) s2 0 1 (1,1) s3 1 0 (0,0) s4 1 1 (-,1) (1,-) dc - - (-,-)This is a Boolean Relation

y-

a+ c-

d-

a-

c+

a+

y+

a-c-

d+

c+

y

acd Facd y c d ( )

Rsy

R

S

y-

a+ c-

d-

a-

c+

a+

y+

a-c-

d+

c+

y

acd acd y c d ( )

Rsy

acdc

d

y-

a+ c-

d-

a-

c+

a+

y+

a-c-

d+

c+

y

acd acd y c d ( )

Rsy

cd yc

a

y-

a+ c-

d-

a-

c+

a+

y+

a-c-

d+

c+

y

acd acd y c d ( )

Rsya

Ddc

Technology mapping

• Merging small gates into larger gates introduces no new hazards

• Standard synchronous technique can be applied, e.g. BDD-based boolean matching

• Handles sequential gates and combinational feedbacks

• Due to hazards there is no guarantee to find correct mapping (some gates cannot be decomposed)

• Timing-aware decomposition can be applied in these rare cases

Specification(STG)

State Graph

SG withCSC

Next-state functions

Decomposed functions

Gate netlist

Reachability analysis

State encoding

Boolean minimization

Logic decomposition

Technology mapping

DesignDesignflowflow

Timing assumptions in design flow

• Speed-independent: wire delays after a forksmaller than fan-out gate delays

• Burst-mode: circuit stabilizes betweentwo changes at the inputs

• Timed circuits: Absolute bounds on gate / environment delays are known a priori (before physical design)

Relative Timing Circuits

• Assumptions: “a before b” – for concurrent events: reduces reachable state space

– for ordered events: permits early enabling

– both increase don’t care space for logic synthesis => simplify logic (better area and timing)

• “Assume - if useful - guarantee” approach: assumptions are used by the tool to derive a circuit and required timing constraints that must be met in physical design flow

• Applied to design of the Rotating Asynchronous Pentium Processor(TM) Instruction Decoder (K.Stevens, S.Rotem et al. Intel Corporation)

Speed-independent C-element

Relative Timing Asynchronous Circuits

a- before b-Timing assumption (on environment):

ab c

RT C-element: faster,smaller; correct only under timing constraint: a- before b-

ab c

State Graph (Read cycle)

DSr+

DSr+

DSr+

DTACK-

DTACK-

DTACK-

LDS-LDS-LDS-

LDTACK- LDTACK- LDTACK-

D-

DSr-DTACK+

D+

LDTACK+

LDS+

Lazy Transition Systems

ER (LDS+)ER (LDS+)

ER (LDS-)ER (LDS-)

LDS-LDS-

LDS+

LDS-DTACK- FR (LDS-)FR (LDS-)

Event LDS- is lazy: firing = subset of enabling

Timing assumptions

• (a before b) for concurrent events: concurrency reduction for firing and enabling

• (a before b) for ordered events: early enabling

• (a simultaneous to b wrt c) for triples of events: combination of the above

Speed-independent Netlist

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACKD

DSr

LDS

LDTACK

csc

map

Adding timing assumptions (I)

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACKD

DSr

LDS

LDTACK

csc

map

LDTACK- before DSr+

FAST

SLOW

Adding timing assumptions (I)

DTACKD

DSr

LDS

LDTACK

csc

map

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

LDTACK- before DSr+

State space domain

LDTACK- before DSr+

LDTACK-

DSr+

State space domain

LDTACK- before DSr+

LDTACK-

DSr+

State space domain

LDTACK- before DSr+

LDTACK-

DSr+

Two more unreachable states

Boolean domain

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

LDS = 0 LDS = 1

0 1-0

0 0 0 0 0 0/1?

1

111

-

-

-

---

- - - -

-

- ---

- - -

Boolean domain

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

LDS = 0 LDS = 1

0 1-0

0 0 - 0 0 1

1

111

-

-

-

---

- - - -

-

- ---

- - -

One more DC vector for all signals One state conflict is removed

Netlist with one constraint

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACKD

DSr

LDS

LDTACK

csc

map

Netlist with one constraint

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACK D

DSr LDS

LDTACK

LDTACK- before DSr+

TIMING CONSTRAINT

Timing assumptions

• (a before b) for concurrent events: concurrency reduction for firing and enabling

• (a before b) for ordered events: early enabling

• (a simultaneous to b wrt c) for triples of events: combination of the above

Ordered events: early enabling

a

c

b

a

a

c

b

a

bb

c cF G

Logic for gate c may change

Adding timing assumptions (II)

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACKD

DSr LDS

LDTACK

D- before LDS-

State space domain

LDS-

D-

Reachable space is unchanged

For LDS- enabling can be changed in one state

D- before LDS-

Potential enabling for LDS-

DSr-

Boolean domain

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

LDS = 0 LDS = 1

0 1-0

0 0 - 0 0 1

1

111

-

-

-

---

- - - -

-

- ---

- - -

Boolean domain

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

DTACKDSrD

LDTACK 00 01 11 10

00

01

11

10

LDS = 0 LDS = 1

0 1-0

0 0 - 0 0 1

1

11-

-

-

-

---

- - - -

-

- ---

- - -

One more DC vector for one signal: LDSIf used: LDS = DSr, otherwise: LDS = DSr + D

Before early enabling

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

DTACKD

DSr LDS

LDTACK

Netlist with two constraints

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

LDTACK- before DSr+and D- before LDS-

TIMING CONSTRAINTSDTACKD

DSr LDS

LDTACK

Both timing assumptions are used for optimization and become constraints

• Rule I (out of 6): a,b - non-input events

– Untimed ordering: a||b and a enabled before b, but not vice versa

– Derived assumption: a fires before b

– Justification: delay of a gate can be made shorter than delay of two (or more) gates: del(a) < del(c)+del(b)

Deriving automatic timing assumptions

aa a

b

b

b

c

c

• Rule I (out of 6): a,b - non-input events

– Untimed ordering: (a||b) and (a enabled before b), but not vice versa

– Derived assumption: a fires before b

– Justification: delay of a gate can be made shorter than delay of two (or more) gates

Deriving automatic timing assumptions

aa a

b

b

b

c

c

– Effect I: a state becomes DC for all signals

• Rule I (out of 6): a,b - non-input events

– Untimed ordering: (a||b) and (a enabled before b), but not vice versa

– Derived assumption: a fires before b

– Justification: delay of a gate can be made shorter than delay of two (or more) gates

Deriving automatic timing assumptions

aa a

b

b

b

c

c

– Effect II: another state becomes local DC for signal of event b

Backannotation of Timing Constraints

• Timed circuits require post-verification

• Can synthesis tools help ?– Report the least stringent set of timing constraints

required for the correctness of the circuit

– Not all initial timing assumptions may be required

• Petrify reports a set of constraints for order of firing that guarantee the circuit correctness

Timing constraints generation

abc

d

e

d d

e e

b

b

c

c

da

Assumptions:

d before b and

c before e and

a before d

Timing constraints generation

abc

d

e

Assumptions:

d before b and

c before e and

a before d

d d

e e

b

b

c

c

da

Timing constraints generation

abc

d

e

Assumptions:

d before b and

c before e and

a before d

d d

e e

b

b

c

cCorrect behavior

da

Timing constraints generation

abc

d

e

Assumptions:

d before b and

c before e and

a before d

d d

e e

b

b

c

c

1

2

Incorrect behavior

da

Covering incorrect behavior

abc

d

e

Assumptions:

d before b and

c before e and

a before d

d d

e e

b

b

c

c

1

2 4

3

{1, 3}

d before b

{1}

d before c

da

5

{2, 4}

c before e

Other possible constraints remove states from assumption domain => invalid

Covering incorrect behavior

abc

d

e

Assumptions:

d before b and

c before e and

a before d

d d

e e

b

b

c

c

1

2 4

3

{1}

d before c

da

5

{2, 4}

c before e

Constraints for the minimal cost solution:

d before c and

c before e

Timing aware state encoding

• Solve only state conflicts reachable in the RT assumptions domain

• Generate automatic timing assumptions for inserted state signals => state signals can be implemented as RT logic

• State variables inserted concurrently with I/O events => latency and cycle time reduction

Value of Relative Timing

• RT circuits provides up to 2-3x (1.3-2x) delay&area reduction with respect to SI circuits synthesized without (with) concurrency reduction

• Automatic generation of timing assumptions => foundation for automatic synthesis of RT circuits with area/performance comparable/better than manual

• Back-annotation of timing constraints => minimal required timing information for the back-end tools

• Timing-aware state encoding allows significant area/performance optimization

Specification(STG + user assumptions)

Lazy State Graph

Lazy SG withCSC

Next-state functions

Decomposed functions

Gate netlist

Reachability analysis

Timing-aware state encoding

Boolean minimization

Logic decomposition

Technology mapping

Design Flow with TimingDesign Flow with Timing

Required Timing Constraints

Automatic Timing Assumptions

FIFO example

FIFOli

lo

ro

ri

li-

li+

lo+

lo-

ro+

ro-

ri+

ri-

Speed-Independent Implementation

without concurrency reduction 3 state signals are required

SI implementation with concurrency reduction

li

lo ro

ri

xli-

li+

lo+

lo-

ro+

ro-

ri+

ri-

x+

x-

+gCgC +-

RT implementation

li

lo ro

ri

xli-

li+

lo+

lo-

ro+

ro-

ri+

ri-

x+

x-

OR

li-

li+

lo+

lo-

ro+

ro-

ri+

ri-

x+

x-

RT implementation

li

lo ro

ri

xli-

li+

lo+

lo-

ro+

ro-

ri+

ri-

x+

x-

OR

li-

li+

lo+

lo-

ro+

ro-

ri+

ri-

x+

x-

To satisfy the constraint: Delay(x- ) < Delay (ri+ ) andDelay(lo+) + Delay(x- ) < Delay(ro+ ) + Delay (ri+ ) All constraints are either satisfied by default oreasy to satisfy by sizing

top related