clockless computing

14
1 Clockless Computing Clockless Computing Montek Singh Montek Singh Thu, Aug 23, 2007 Thu, Aug 23, 2007

Upload: maggie-watkins

Post on 31-Dec-2015

21 views

Category:

Documents


1 download

DESCRIPTION

Clockless Computing. Montek Singh Thu, Aug 23, 2007. Preliminaries. How is data represented in an asynchronous system? How is information exchanged?: control signaling (handshake styles). Data Encoding: “Bundled Data”. matched delay. request. done. bit 1. bit 1. done indicates - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clockless Computing

1

Clockless ComputingClockless Computing

Montek SinghMontek Singh

Thu, Aug 23, 2007Thu, Aug 23, 2007

Page 2: Clockless Computing

2

PreliminariesPreliminaries

How is data represented in an asynchronous How is data represented in an asynchronous system?system? How is information exchanged?: control How is information exchanged?: control signalingsignaling (handshake styles) (handshake styles)

Page 3: Clockless Computing

3

Data Encoding: Data Encoding: “Bundled Data”“Bundled Data”Single-rail “Bundled Datapath”: Single-rail “Bundled Datapath”: simplest approach simplest approach

widely usedwidely used

Features:Features: datapath: datapath: 1 wire per bit (e.g. standard sync blocks)1 wire per bit (e.g. standard sync blocks) matched delay: matched delay: produces delayed produces delayed “done”“done” signal signal

worst-case delay: longer than slowest pathworst-case delay: longer than slowest path

+ Practical style: can reuse sync componentsPractical style: can reuse sync components; ; small areasmall area

– Fixed (worst-case) completion timeFixed (worst-case) completion time

donedone indicatesindicates valid datavalid data

bit 1bit 1

requestrequest

bit nbit n

bit 1bit 1

bit mbit m

donedonematchedmatcheddelaydelay

functionfunctionblockblock

Page 4: Clockless Computing

4

Bundled Data: Completion Bundled Data: Completion SensingSensingDelay Matching:Delay Matching:

either single worst-case delayeither single worst-case delay or, fine-grain delayor, fine-grain delay

request done

bank of delays

MUX

delayselector

Speculative completion:Speculative completion: choose delay “on the fly”choose delay “on the fly” start with shortest delay; increase as neededstart with shortest delay; increase as needed

Page 5: Clockless Computing

5

+provides provides robustrobust data-dependent completion data-dependent completion

– needs completion detectorsneeds completion detectors

Data Encoding: Data Encoding: Dual-RailDual-Rail Dual-rail: Dual-rail: uses 2 wires per data bituses 2 wires per data bit

Dual-rail code

Meaning

00 “reset” value 01 0 value 10 1 value 11 unused

bit nbit n

bit 1bit 1

bit mbit m

bit 1bit 1

Each Dual-Rail Pair:Each Dual-Rail Pair: provides both provides both data valuedata value and and

validityvalidity

Page 6: Clockless Computing

6

Dual-Rail: Completion SensingDual-Rail: Completion SensingDual-Rail Completion Detector:Dual-Rail Completion Detector:

combines dual-rail signalscombines dual-rail signals indicates when all bits are valid (or reset)indicates when all bits are valid (or reset)

CCDoneDone

ORORbitbit00

ORORbitbit11

ORORbitbitnn

OROR together 2 rails per bit together 2 rails per bit Merge results using a Müller Merge results using a Müller “C-element”“C-element”

C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value

C-element:C-element:if all inputs=1, output if all inputs=1, output 1 1if all inputs=0, output if all inputs=0, output 0 0else, maintain output valueelse, maintain output value

Page 7: Clockless Computing

7

4-Phase: 4-Phase: requires 4 events per handshakerequires 4 events per handshake

Handshaking Styles: Handshaking Styles: 4-phase4-phase

RequestRequest

AcknowledgeAcknowledge

startevent

eventdone

get ready fornext event

ready fornext event

+ ““Level-sensitive”Level-sensitive” simpler logic simpler logic implementationimplementation

– Overhead of Overhead of “return-to-zero”“return-to-zero” (RTZ or (RTZ or resetting)resetting) extra events which do no useful computationextra events which do no useful computation

Page 8: Clockless Computing

8

+ Elegant: Elegant: no return-to-zerono return-to-zero– Slower logic implementation:Slower logic implementation:

logic primitives are inherently level-sensitive, not event-logic primitives are inherently level-sensitive, not event-based (at least in CMOS)based (at least in CMOS)

Handshaking Styles: Handshaking Styles: 2-phase2-phase2-Phase: 2-Phase: requires 2 events per handshakerequires 2 events per handshake

a.k.a. a.k.a. transition signalingtransition signaling

RequestRequest

AcknowledgeAcknowledge

startevent

eventdone

start nextevent

next eventdone

Page 9: Clockless Computing

9

+ No return-to-zero (like 2-phase)No return-to-zero (like 2-phase)

+ Level-based implementation (like 4-phase)Level-based implementation (like 4-phase)– Need a timing constraint on pulse widthNeed a timing constraint on pulse width

Handshaking Styles: Handshaking Styles: Pulse ModePulse ModePulse Mode: Pulse Mode: combines benefits of 2-phase and 4-combines benefits of 2-phase and 4-

phasephase use pulses to represent eventsuse pulses to represent events

RequestRequest

AcknowledgeAcknowledge

startevent

eventdone

start nextevent

next eventdone

Page 10: Clockless Computing

10

+ Efficient protocol: no return-to-zero, level-Efficient protocol: no return-to-zero, level-basedbased

– Need aggressive low-level design techniquesNeed aggressive low-level design techniques much effort to ensure reliability, satisfy timing constraintsmuch effort to ensure reliability, satisfy timing constraints

Handshaking Styles: Handshaking Styles: Single-TrackSingle-TrackSingle-Track: Single-Track: combines req and ack onto single combines req and ack onto single

wire!wire! one wire used for bidirectional communicationone wire used for bidirectional communication

sender raises, receiver lowerssender raises, receiver lowersreq + ackreq + ack

RequestRequest

AcknowledgeAcknowledge

reqreq reqreq

ackack ackack

Page 11: Clockless Computing

11

Handshaking + Data Handshaking + Data RepresentationRepresentationSeveral combinations possible:Several combinations possible:

dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single-rail 2-phasesingle-rail 2-phase

Example:Example: dual-rail 4-phase dual-rail 4-phase

dual-rail data: dual-rail data: functions as anfunctions as an implicit implicit “request”“request” 4-phase cycle: between 4-phase cycle: between acknowledgeacknowledge and and implicit requestimplicit request

bit mbit m

bit 1bit 1

ackack

AA BB

Page 12: Clockless Computing

12

Other Data Representation StylesOther Data Representation Styles Level-Encoded Dual-Rail (LEDR)Level-Encoded Dual-Rail (LEDR)

2 wires per bit: 2 wires per bit: “data”“data” and and “phase”“phase” exactly one wire per bit changes valueexactly one wire per bit changes value

if new value is different, if new value is different, “data”“data” wire changes value wire changes valueelse else “phase”“phase” wire change value wire change value

M-of-N CodesM-of-N Codes N wires used for a data wordN wires used for a data word M wires (M <= N) change valueM wires (M <= N) change value Values of N and M: have impact on…Values of N and M: have impact on…

information transmitted, power consumed and logic information transmitted, power consumed and logic complexitycomplexity

Knuth codes, Huffman codes, …Knuth codes, Huffman codes, …

datadataphasephase

Page 13: Clockless Computing

13

Which to use?Which to use?Depends on several performance parameters:Depends on several performance parameters:

speedspeed single-rail vs. dual-railsingle-rail vs. dual-rail

– single-rail may be faster (if designed aggressively)single-rail may be faster (if designed aggressively)– dual-rail may be faster (if completion times vary widely)dual-rail may be faster (if completion times vary widely)

2-phase vs. 4-phase2-phase vs. 4-phase– 2-phase may be faster (if logic overhead is small)2-phase may be faster (if logic overhead is small)– 4-phase may be faster (if overhead of return-to-zero is small)4-phase may be faster (if overhead of return-to-zero is small)

power consumptionpower consumption2-phase typically has fewer gate transitions (2-phase typically has fewer gate transitions ( lower lower

power)power) amount of logic used (#gates/wires/pins amount of logic used (#gates/wires/pins chip area) chip area)

single-rail needs fewer gates/wires/pinssingle-rail needs fewer gates/wires/pins design and verification effortdesign and verification effort

dual-rail, 1-of-N, M-of-N, Knuth codes…:dual-rail, 1-of-N, M-of-N, Knuth codes…:– delay-insensitive: robust in the presence of arbitrary delaysdelay-insensitive: robust in the presence of arbitrary delays

single-rail: requires greater timing verification effortsingle-rail: requires greater timing verification effort

Page 14: Clockless Computing

14

Homework #2 (due Thu Aug 30)Homework #2 (due Thu Aug 30) Suppose you are given N wiresSuppose you are given N wires

Which M-of-N encoding (i.e. what M) encodes most Which M-of-N encoding (i.e. what M) encodes most information?information?

Suppose you have to encode 4-bit valuesSuppose you have to encode 4-bit values Which M-of-N encoding yields fewest wires?Which M-of-N encoding yields fewest wires?

Suppose you can switch at most 2 wiresSuppose you can switch at most 2 wires Which M-of-N encoding yields fewest wires for 4-bit Which M-of-N encoding yields fewest wires for 4-bit

values?values?