![Page 1: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/1.jpg)
1
Recap: Lectures 5 & 6Recap: Lectures 5 & 6Classic Pipeline StylesClassic Pipeline Styles
1.1. Williams and Horowitz’s Williams and Horowitz’s PS0 PS0 pipelinepipeline
2.2. Sutherland’s Sutherland’s micropipelinesmicropipelines
![Page 2: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/2.jpg)
2
Different Points in the Design Different Points in the Design SpaceSpaceWilliams/Horowitz’s Williams/Horowitz’s PS0:PS0:
Dual-railDual-rail Data-dependent Data-dependent
completioncompletion Dynamic logicDynamic logic No extra latchesNo extra latches ““Zero-overhead” latencyZero-overhead” latency 4-phase handshakes: 4-phase handshakes:
resetting overheadresetting overhead
Sutherland’s Sutherland’s micropipelines:micropipelines: Single-railSingle-rail Worst case matched Worst case matched
delaydelay Statuc logicStatuc logic Explicit latchesExplicit latches Latch latencies = Latch latencies =
overheadoverhead Elegant transition Elegant transition
signalingsignaling
![Page 3: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/3.jpg)
3Precharge Precharge Evaluate: Evaluate: another 3 eventsanother 3 eventsComplete cycle: Complete cycle: 6 events6 events
indicates “done”indicates “done”
PRECHARGE N:PRECHARGE N: when N+1 completes evaluationwhen N+1 completes evaluationdelete data:delete data: afterafter next stage has copied it next stage has copied it
EVALUATE N:EVALUATE N: when N+1 completes prechargingwhen N+1 completes prechargingaccept new data: accept new data: after after next stage is emptiednext stage is emptied
PS0 ProtocolPS0 Protocol
11 22 33
44
55
66
evaluatesevaluates evaluatesevaluates evaluatesevaluates
indicates “done”indicates “done”
prechargesprecharges
indicates “done”indicates “done”33
Evaluate Evaluate Precharge: Precharge: 3 events3 events
NN N+1N+1 N+2N+2
![Page 4: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/4.jpg)
4
PS0 PerformancePS0 Performance
TEVAL Evaluation TimeTPRECH Precharge TimeTDETECT Completion Detection Time
11 22 33
44
55
66
DETECTPRECHEVAL TTT 23Cycle Time =Cycle Time =
![Page 5: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/5.jpg)
5
Drawbacks of PSO PipeliningDrawbacks of PSO Pipelining1.1. Poor throughput:Poor throughput:
long cycle time: 6 events per cyclelong cycle time: 6 events per cycle data “tokens” are forced far apart in timedata “tokens” are forced far apart in time
2.2. Limited storage capacity:Limited storage capacity: max only 50% of stages can hold distinct tokensmax only 50% of stages can hold distinct tokens data tokens must be separated by at least one data tokens must be separated by at least one
spacerspacer
Our Research Goals: Our Research Goals: address both issuesaddress both issues still maintain very low latencystill maintain very low latency
![Page 6: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/6.jpg)
6
Lecture 7: Lecture 7: Recent ApproachesRecent Approaches
![Page 7: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/7.jpg)
7
Recent ApproachesRecent Approaches3 novel styles for high-speed async pipelining:3 novel styles for high-speed async pipelining:
““Lookahead Pipelines”Lookahead Pipelines” (LP) (LP) [Singh/Nowick, Async-00][Singh/Nowick, Async-00] ““High-Capacity Pipelines”High-Capacity Pipelines” (HC) (HC) [Singh/Nowick, [Singh/Nowick,
WVLSI-00]WVLSI-00] MOUSETRAP Pipelines MOUSETRAP Pipelines [Singh/Nowick, TAU-00][Singh/Nowick, TAU-00]
Goal:Goal: significantly improve throughput of PS0significantly improve throughput of PS0Two Distinct Strategies:Two Distinct Strategies:
LP: LP: introduceintroduce protocol optimizations protocol optimizations““shave off”shave off” components from critical cycle components from critical cycle
HC: HC: fundamentally new protocolfundamentally new protocolgreater concurrency: “loosely-coupled” stagesgreater concurrency: “loosely-coupled” stages
![Page 8: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/8.jpg)
8
OutlineOutline New Asynchronous Pipelines: New Asynchronous Pipelines:
LLookahead ookahead PPipelines (LP)ipelines (LP) HHigh-igh-CCapacity Pipelines (HC)apacity Pipelines (HC) MOUSETRAP PipelinesMOUSETRAP Pipelines
Dynamic circuit styleDynamic circuit style
Static circuit styleStatic circuit style
![Page 9: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/9.jpg)
9
Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #1#1Use non-neighbor communication:Use non-neighbor communication:
stage receives information stage receives information from from multiple later multiple later stagesstages
allows allows “early evaluation” “early evaluation”
Benefit:Benefit: stage gets stage gets head-starthead-start on next on next cyclecycle
![Page 10: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/10.jpg)
10
Lookahead Pipelines: Strategy Lookahead Pipelines: Strategy #2#2Use early completion detection:Use early completion detection:
completion detector completion detector moved before stagemoved before stage (not after) (not after) stage indicatesstage indicates “early done”“early done” in parallel with in parallel with
computationcomputation
Benefit:Benefit: again, stage gets again, stage gets head-starthead-start on on next cyclenext cycle
early completion detectorearly completion detector
![Page 11: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/11.jpg)
11
Lookahead Pipelines: OverviewLookahead Pipelines: Overview5 New Designs:5 New Designs:
““Dual-Rail” Data Signaling:Dual-Rail” Data Signaling: LP3/1:LP3/1: “early evaluation”“early evaluation” LP2/2:LP2/2: “early done”“early done” LP2/1:LP2/1: “early evaluation” + “early done”“early evaluation” + “early done”
““Single-Rail” Bundled-Data Signaling:Single-Rail” Bundled-Data Signaling: LPLPSRSR2/2:2/2: “early done”“early done” LPLPSRSR2/1:2/1: “early evaluation” + “early done”“early evaluation” + “early done”
![Page 12: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/12.jpg)
12
Optimization = Optimization = “early evaluation”“early evaluation” each stage has two control inputs: from stages N+1 and N+2each stage has two control inputs: from stages N+1 and N+2
Idea: Idea: shorten precharge phaseshorten precharge phase terminate precharge terminate precharge early:early: when N+2 is done evaluating when N+2 is done evaluating
Dual-Rail Design #1: Dual-Rail Design #1: LP3/1LP3/1
Datain Data
out
PCPC EvalEval
From N+2From N+2
NN N+1N+1 N+2N+2Processing
BlockCompletion
Detector
![Page 13: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/13.jpg)
13
LP3/1 ProtocolLP3/1 Protocol PRECHARGEPRECHARGE N:N: when N+1 completes when N+1 completes
evaluationevaluation EVALUATEEVALUATE N:N: whenwhen N+2N+2 completes completes
evaluationevaluationNew!New!
11 22 33
Enables “early evaluation!”Enables “early evaluation!”
44
N evaluatesN evaluates N+1 evaluatesN+1 evaluates
N+2 indicates “done”N+2 indicates “done”
N+2 evaluatesN+2 evaluates
NN N+1N+1 N+2N+2
N+1 indicates “done”N+1 indicates “done”33
![Page 14: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/14.jpg)
14
PS0PS0
LP3/1LP3/1
LP3/1: Comparison with PS0LP3/1: Comparison with PS0
55
44
4466
NN N+1N+1 N+2N+2
NN N+1N+1 N+2N+2
Enables “early evaluation!”Enables “early evaluation!”
11
11
evaluatesevaluates
evaluatesevaluates22
22
evaluatesevaluates
evaluatesevaluates33
33evaluatesevaluates
evaluatesevaluatesOnly 4 events in cycle!Only 4 events in cycle!
6 events in cycle6 events in cycle
PRECHARGE N:PRECHARGE N: when N+1 when N+1completes evaluationcompletes evaluation
33indicates “done”indicates “done”
indicates “done”indicates “done”33
EVALUATE N:EVALUATE N: when N+2 completes evaluation when N+2 completes evaluation
EVALUATE N:EVALUATE N: when N+1 completes precharging when N+1 completes precharging
![Page 15: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/15.jpg)
15
11 22 33
44
LP3/1 PerformanceLP3/1 Performance
DETECTEVAL TT 3Cycle Time =Cycle Time =
saved pathsaved path
Savings over PS0:Savings over PS0: 1 Precharge + 1 Completion Detection1 Precharge + 1 Completion Detection
![Page 16: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/16.jpg)
16
LP3/1: Inside a StageLP3/1: Inside a Stage
Precharge Precharge whenwhen PC=1PC=1(and Eval=0)(and Eval=0)
Evaluate Evaluate “early”“early” whenwhen Eval=1Eval=1(or PC=0)(or PC=0)
PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)
NANDNAND
A NAND gate mergesA NAND gate merges2 control inputs:2 control inputs:
Problem: Problem: “early”“early” Eval=1Eval=1 is non- is non-persistent!persistent!
may be de-asserted may be de-asserted beforebefore stage completes stage completes evaluation!evaluation!
Merging 2 Control Inputs:Merging 2 Control Inputs:
““early Eval”early Eval”
““old Eval”old Eval”
![Page 17: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/17.jpg)
17
LP3/1 Timing Constraints: LP3/1 Timing Constraints: ExampleExample
Observation:Observation: PC=0PC=0 soon aftersoon after Eval=1, Eval=1, and is persistentand is persistentSolution:Solution: no change!no change!
use PC as safeuse PC as safe “takeover”“takeover” for Eval!for Eval!Timing Constraint:Timing Constraint: PC=0PC=0 must arrivemust arrive beforebefore Eval de-assertedEval de-asserted
simple one-sided timing requirementsimple one-sided timing requirementother constraints as well… all easily satisfied in practiceother constraints as well… all easily satisfied in practice
PC (From Stage N+1)PC (From Stage N+1)Eval (From Stage N+2)Eval (From Stage N+2)
NANDNAND
Problem (cont.):Problem (cont.): “early”“early” Eval=1Eval=1 non-persistent non-persistent
![Page 18: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/18.jpg)
18
Dual-Rail Design #2: Dual-Rail Design #2: LP2/2LP2/2Optimization = Optimization = “early done”“early done”
Idea: move completion detector Idea: move completion detector beforebefore processing processing blockblockstage indicates whenstage indicates when “about to”“about to” precharge/evaluateprecharge/evaluate
ProcessingBlock
“early” Completion
Detector
Datain
Dataout
“early done”
![Page 19: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/19.jpg)
19
11 22
44
LP2/2 ProtocolLP2/2 ProtocolCompletion Detection:Completion Detection:
performedperformed in parallel in parallel with evaluation/precharge of with evaluation/precharge of stagestage
N evaluatesN evaluates N+1 evaluatesN+1 evaluates
NN N+1N+1 N+2N+2
22
““early done”early done”of N+1 evalof N+1 eval
33
33
““early done”early done”of N+2 evalof N+2 eval
““early done”early done”of N+1 prechof N+1 prech
![Page 20: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/20.jpg)
20
LP2/2 PerformanceLP2/2 Performance
11 22
3344
LP2/2 savings over PS0: LP2/2 savings over PS0: 1 Evaluation + 1 Precharge1 Evaluation + 1 Precharge
DETECTEVAL TT 22Cycle Time =Cycle Time =
![Page 21: Recap: Lectures 5 & 6 Classic Pipeline Styles](https://reader036.vdocument.in/reader036/viewer/2022070500/5681681f550346895dddaec6/html5/thumbnails/21.jpg)
21
Dual-Rail Design #3: Dual-Rail Design #3: LP2/1LP2/1Hybrid of LP3/1 and LP2/2…Hybrid of LP3/1 and LP2/2…Combines:Combines:
early evaluationearly evaluation of LP3/1of LP3/1 early doneearly done of LP2/2of LP2/2
Cycle time:Cycle time: Best of our dual-rail lookahead Best of our dual-rail lookahead pipelines… pipelines…