compositional dataflow circuits
TRANSCRIPT
![Page 1: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/1.jpg)
Compositional Dataflow Circuits
Stephen A. Edwards Richard Townsend Martha A. Kim
Columbia University
MEMOCODE, Vienna, Austria, October 1, 2017
![Page 2: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/2.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 3: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/3.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 4: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/4.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 5: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/5.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 6: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/6.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 7: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/7.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
−
−
![Page 8: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/8.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
−
−
![Page 9: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/9.jpg)
gcd(a,b) =if a = b
aelse if a < b
gcd(a,b −a)else
gcd(a −b,b)
Townsend et al.
CC ’2017
1 0 1 0
a b
=
mux
1 0 1 0
1 0 1 0
fork
gcd(a,b)discard
<
1
initialtoken
1 0 1 0
demux
− −
![Page 10: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/10.jpg)
Patience Through Handshaking
Want patient blocks to handle delays from
Memory systemsData-dependent
computations
Full buffersShared resourcesBusy computational
units
up
stream
do
wn
stream
data
valid
ready
valid ready Meaning
1 1 Token transferred1 0 Token valid; held0 − No token to transfer
Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure
![Page 11: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/11.jpg)
Patience Through Handshaking
Want patient blocks to handle delays from
Memory systemsData-dependent
computations
Full buffersShared resourcesBusy computational
units
up
stream
do
wn
stream
data
valid
ready
valid ready Meaning
1 1 Token transferred1 0 Token valid; held0 − No token to transfer
Latency-insensitive Design (Carloni et al.)Elastic Circuits (Cortadella et al.)FIFOs with backpressure
![Page 12: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/12.jpg)
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Datapath
Combinational function ignores flow control
![Page 13: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/13.jpg)
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Valid network
Output valid if both inputs are valid
![Page 14: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/14.jpg)
Combinational Function Block
Strict/Unit Rate:All input tokens required to produce an output
in0
in1 out
f
Ready network
Input tokens consumed if output token is consumed(output is valid and ready)
![Page 15: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/15.jpg)
Multiplexer Block
in0 in1 in2
out
select
in0
in1
in2
select out
decoder
![Page 16: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/16.jpg)
Demultiplexer Block
out0 out1 out2
in
select
select
in
out2
out1
out0
decoder
![Page 17: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/17.jpg)
Buffering a Linear Pipeline (Point 1/4)
Combinational block
Data buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
![Page 18: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/18.jpg)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
![Page 19: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/19.jpg)
Buffering a Linear Pipeline (Point 1/4)
Combinational block
Data buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
![Page 20: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/20.jpg)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
![Page 21: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/21.jpg)
Buffering a Linear Pipeline (Point 1/4)
Combinational blockData buffer:Pipeline registerwith valid, enable
01
Control Buffer:Register diverts token whendownstream suddenly stops
01
01 0
10
Long Combinational Path (Data + Valid)
Long Combinational Path (Ready)
Cao et al. MEMOCODE 2015Inspired by Carloni’s Latency Insensitive Design (e.g., MEMOCODE 2007)
![Page 22: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/22.jpg)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
![Page 23: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/23.jpg)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
![Page 24: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/24.jpg)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
![Page 25: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/25.jpg)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
![Page 26: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/26.jpg)
The Problem with Fork
Combinational Block:inputs ready whenboth valid &output ready
Fork:outputs valid onlywhen all are ready
Oops: Combinational CycleThis is not compositional
![Page 27: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/27.jpg)
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
![Page 28: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/28.jpg)
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
![Page 29: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/29.jpg)
The Solution to Combinational Loops (Point 2/4)
valid
ready
XXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
![Page 30: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/30.jpg)
The Solution to Combinational Loops (Point 2/4)
valid
readyXXXXX
Allowed: Combinationalpaths from valid to ready
Prohibited: Combinationalpaths from ready to valid
![Page 31: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/31.jpg)
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
![Page 32: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/32.jpg)
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
![Page 33: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/33.jpg)
The Solution to Fork: A Little State (Point 3/4)
in
out2
out1
out0
Valid out ignores readyof other outputs
Flip-flop set after tokensent suppresses duplicates
Input consumed once onetoken sent on every output
![Page 34: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/34.jpg)
Nondeterministic Merge (Point 4/4)
f f fShare with
merge/demux
merge
f
demux
select
![Page 35: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/35.jpg)
Two-Way Nondeterministic Merge Block w/ Select
in0
in1
out
sel
01
Arb
iter
“Two-way fork with multiplexed outputselected by an arbiter”
![Page 36: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/36.jpg)
Experiments: Random Buffer Placement
0
2
4
6
2 4 6 8 10
(7 buffers)Com
plet
ion
Tim
e (µ
s)
Number of buffer pairs
GCD(100,2)
0
750
1500
2250
2 4 6 8 10
(80 buffers)
21-way Conveyor
0
1
2
3
2 4 6 8 10
(96 buffers)
BSN
![Page 37: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/37.jpg)
Best Buffering for GCD (Manually Obtained)
Each loop has one of each buffer
Data Buffer
Control Buffer
![Page 38: Compositional Dataflow Circuits](https://reader030.vdocument.in/reader030/viewer/2022012806/61bd45a061276e740b111f79/html5/thumbnails/38.jpg)
Summary
Compositional Dataflow Networks as an IR
Patient dataflow blocks with valid/ready handshaking
1. Break downstream, upstream paths w/ two buffer types
2. Avoid comb. cycles: prohibit ready-to-valid paths
3. Add one state bit per output so forks may “race ahead”
4. Tame nondeterministic merge with a select output
Random buffer placement experiments show it works