clockless computing
DESCRIPTION
Clockless Computing. Montek Singh Thu, Aug 23, 2007. Preliminaries. How is data represented in an asynchronous system? How is information exchanged?: control signaling (handshake styles). Data Encoding: “Bundled Data”. matched delay. request. done. bit 1. bit 1. done indicates - PowerPoint PPT PresentationTRANSCRIPT
1
Clockless ComputingClockless Computing
Montek SinghMontek Singh
Thu, Aug 23, 2007Thu, Aug 23, 2007
2
PreliminariesPreliminaries
How is data represented in an asynchronous How is data represented in an asynchronous system?system? How is information exchanged?: control How is information exchanged?: control signalingsignaling (handshake styles) (handshake styles)
3
Data Encoding: Data Encoding: “Bundled Data”“Bundled Data”Single-rail “Bundled Datapath”: Single-rail “Bundled Datapath”: simplest approach simplest approach
widely usedwidely used
Features:Features: datapath: datapath: 1 wire per bit (e.g. standard sync blocks)1 wire per bit (e.g. standard sync blocks) matched delay: matched delay: produces delayed produces delayed “done”“done” signal signal
worst-case delay: longer than slowest pathworst-case delay: longer than slowest path
+ Practical style: can reuse sync componentsPractical style: can reuse sync components; ; small areasmall area
– Fixed (worst-case) completion timeFixed (worst-case) completion time
donedone indicatesindicates valid datavalid data
bit 1bit 1
requestrequest
bit nbit n
bit 1bit 1
bit mbit m
donedonematchedmatcheddelaydelay
functionfunctionblockblock
4
Bundled Data: Completion Bundled Data: Completion SensingSensingDelay Matching:Delay Matching:
either single worst-case delayeither single worst-case delay or, fine-grain delayor, fine-grain delay
request done
bank of delays
MUX
delayselector
Speculative completion:Speculative completion: choose delay “on the fly”choose delay “on the fly” start with shortest delay; increase as neededstart with shortest delay; increase as needed
5
+provides provides robustrobust data-dependent completion data-dependent completion
– needs completion detectorsneeds completion detectors
Data Encoding: Data Encoding: Dual-RailDual-Rail Dual-rail: Dual-rail: uses 2 wires per data bituses 2 wires per data bit
Dual-rail code
Meaning
00 “reset” value 01 0 value 10 1 value 11 unused
bit nbit n
bit 1bit 1
bit mbit m
bit 1bit 1
Each Dual-Rail Pair:Each Dual-Rail Pair: provides both provides both data valuedata value and and
validityvalidity
6
Dual-Rail: Completion SensingDual-Rail: Completion SensingDual-Rail Completion Detector:Dual-Rail Completion Detector:
combines dual-rail signalscombines dual-rail signals indicates when all bits are valid (or reset)indicates when all bits are valid (or reset)
CCDoneDone
ORORbitbit00
ORORbitbit11
ORORbitbitnn
OROR together 2 rails per bit together 2 rails per bit Merge results using a Müller Merge results using a Müller “C-element”“C-element”
C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value
C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value
7
4-Phase: 4-Phase: requires 4 events per handshakerequires 4 events per handshake
Handshaking Styles: Handshaking Styles: 4-phase4-phase
RequestRequest
AcknowledgeAcknowledge
startevent
eventdone
get ready fornext event
ready fornext event
+ ““Level-sensitive”Level-sensitive” simpler logic simpler logic implementationimplementation
– Overhead of Overhead of “return-to-zero”“return-to-zero” (RTZ or (RTZ or resetting)resetting) extra events which do no useful computationextra events which do no useful computation
8
+ Elegant: Elegant: no return-to-zerono return-to-zero– Slower logic implementation:Slower logic implementation:
logic primitives are inherently level-sensitive, not event-logic primitives are inherently level-sensitive, not event-based (at least in CMOS)based (at least in CMOS)
Handshaking Styles: Handshaking Styles: 2-phase2-phase2-Phase: 2-Phase: requires 2 events per handshakerequires 2 events per handshake
a.k.a. a.k.a. transition signalingtransition signaling
RequestRequest
AcknowledgeAcknowledge
startevent
eventdone
start nextevent
next eventdone
9
+ No return-to-zero (like 2-phase)No return-to-zero (like 2-phase)
+ Level-based implementation (like 4-phase)Level-based implementation (like 4-phase)– Need a timing constraint on pulse widthNeed a timing constraint on pulse width
Handshaking Styles: Handshaking Styles: Pulse ModePulse ModePulse Mode: Pulse Mode: combines benefits of 2-phase and 4-combines benefits of 2-phase and 4-
phasephase use pulses to represent eventsuse pulses to represent events
RequestRequest
AcknowledgeAcknowledge
startevent
eventdone
start nextevent
next eventdone
10
+ Efficient protocol: no return-to-zero, level-Efficient protocol: no return-to-zero, level-basedbased
– Need aggressive low-level design techniquesNeed aggressive low-level design techniques much effort to ensure reliability, satisfy timing constraintsmuch effort to ensure reliability, satisfy timing constraints
Handshaking Styles: Handshaking Styles: Single-TrackSingle-TrackSingle-Track: Single-Track: combines req and ack onto single combines req and ack onto single
wire!wire! one wire used for bidirectional communicationone wire used for bidirectional communication
sender raises, receiver lowerssender raises, receiver lowersreq + ackreq + ack
RequestRequest
AcknowledgeAcknowledge
reqreq reqreq
ackack ackack
11
Handshaking + Data Handshaking + Data RepresentationRepresentationSeveral combinations possible:Several combinations possible:
dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single-rail 2-phasesingle-rail 2-phase
Example:Example: dual-rail 4-phase dual-rail 4-phase
dual-rail data: dual-rail data: functions as anfunctions as an implicit implicit “request”“request” 4-phase cycle: between 4-phase cycle: between acknowledgeacknowledge and and implicit requestimplicit request
bit mbit m
bit 1bit 1
ackack
AA BB
12
Other Data Representation StylesOther Data Representation Styles Level-Encoded Dual-Rail (LEDR)Level-Encoded Dual-Rail (LEDR)
2 wires per bit: 2 wires per bit: “data”“data” and and “phase”“phase” exactly one wire per bit changes valueexactly one wire per bit changes value
if new value is different, if new value is different, “data”“data” wire changes value wire changes valueelse else “phase”“phase” wire change value wire change value
M-of-N CodesM-of-N Codes N wires used for a data wordN wires used for a data word M wires (M <= N) change valueM wires (M <= N) change value Values of N and M: have impact on…Values of N and M: have impact on…
information transmitted, power consumed and logic information transmitted, power consumed and logic complexitycomplexity
Knuth codes, Huffman codes, …Knuth codes, Huffman codes, …
datadataphasephase
13
Which to use?Which to use?Depends on several performance parameters:Depends on several performance parameters:
speedspeed single-rail vs. dual-railsingle-rail vs. dual-rail
– single-rail may be faster (if designed aggressively)single-rail may be faster (if designed aggressively)– dual-rail may be faster (if completion times vary widely)dual-rail may be faster (if completion times vary widely)
2-phase vs. 4-phase2-phase vs. 4-phase– 2-phase may be faster (if logic overhead is small)2-phase may be faster (if logic overhead is small)– 4-phase may be faster (if overhead of return-to-zero is small)4-phase may be faster (if overhead of return-to-zero is small)
power consumptionpower consumption2-phase typically has fewer gate transitions (2-phase typically has fewer gate transitions ( lower lower
power)power) amount of logic used (#gates/wires/pins amount of logic used (#gates/wires/pins chip area) chip area)
single-rail needs fewer gates/wires/pinssingle-rail needs fewer gates/wires/pins design and verification effortdesign and verification effort
dual-rail, 1-of-N, M-of-N, Knuth codes…:dual-rail, 1-of-N, M-of-N, Knuth codes…:– delay-insensitive: robust in the presence of arbitrary delaysdelay-insensitive: robust in the presence of arbitrary delays
single-rail: requires greater timing verification effortsingle-rail: requires greater timing verification effort
14
Homework #2 (due Thu Aug 30)Homework #2 (due Thu Aug 30) Suppose you are given N wiresSuppose you are given N wires
Which M-of-N encoding (i.e. what M) encodes most Which M-of-N encoding (i.e. what M) encodes most information?information?
Suppose you have to encode 4-bit valuesSuppose you have to encode 4-bit values Which M-of-N encoding yields fewest wires?Which M-of-N encoding yields fewest wires?
Suppose you can switch at most 2 wiresSuppose you can switch at most 2 wires Which M-of-N encoding yields fewest wires for 4-bit Which M-of-N encoding yields fewest wires for 4-bit
values?values?