![Page 1: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/1.jpg)
CS294-6Reconfigurable Computing
Day 23
November 10, 1998
Stream Processing
![Page 2: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/2.jpg)
Previously
• Computing Requirements
• SCORE– stream-based computing model– use streams for linking computations
• instead of shared memory locations
• expose parallelism
• freedom of sequential/spatial implementation
![Page 3: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/3.jpg)
Today
• Streams moderately well developed for– sequential atoms in multithreaded/multiprocessor
environment
• General DF case• SDF• Expression• ...thoughts on adapting ideas for SCORE-like
execution
![Page 4: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/4.jpg)
General Dataflow case
• Dataflow graph exposes parallelism
• Operators enabled as soon as data is available
• Captures partial ordering for computation
• Adaptive/tolerant to latencies in system
• => great for exposing parallelism
![Page 5: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/5.jpg)
General Dataflow
• Fine-grained– expose maximum parallelism– …but rendevous/presence overhead for every
operator
• Who runs when is unpredictable– variable latencies– variable consumption/production– => force runtime synchronization/scheduling
![Page 6: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/6.jpg)
General Dataflow
• What structure to exploit to reduce requirements?
![Page 7: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/7.jpg)
General Dataflow
• What structure to exploit to reduce requirements?– Spatial operator locality
• most communication local (sequential)
– Operation blocks• only do dataflow presence on input to region of code• sequential/direct computation of subgraph
– all local/deterministic computations in subgraph
– Cyclic/predictable dataflow?
![Page 8: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/8.jpg)
Dataflow <=>Multithreading
• Original DF: – synchronize per instruction
• Hybrid DF -> TAM– synchronize on remote memory access (msgs)– run scheduling quanta (several instructions)
• Multithreading– coarse-grain tasks– synchronize on input data– (also locking)
![Page 9: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/9.jpg)
What to watch for
• With arbitrary I/O rates– unbounded buffering requirements
![Page 10: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/10.jpg)
Synchronous Data Flow
• Restriction– number of tokens produced/consumed is
constant per operator firing– these numbers known at compile time– each edge has predetermined number of initial
tokens
• Consistent– admissible and periodic
![Page 11: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/11.jpg)
SDF: Periodic
• Periodic– invoke each operator at least once– return to initial state (# tokens on each edge)– can determine by balance equations
![Page 12: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/12.jpg)
SDF: Admissible
• Admissible– firing sequence not yield deadlock
![Page 13: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/13.jpg)
SDF: Inadmissible
![Page 14: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/14.jpg)
SDF: Admissible
![Page 15: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/15.jpg)
Benefits
• Periodic schedules
• Bounded buffer requirements– Acyclic graphs
• optimal algorithm
– Cycle• NP-complete
• heuristic algorithm … close to optimal buffering
![Page 16: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/16.jpg)
SDF Example
• By Balance Equations– 1 A, 2 B, 4 C
• Firing Sequences:– ABCBCCC
– ABCCBCC
– ABBCCCC
• Buffer Costs– 5 (AB=2 BC=3)
– 4 (AB=2 BC=2)
– 6 (AB=2 BC=4)
![Page 17: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/17.jpg)
Scheduling (min buffer)
• F= fireable operator
• D=deferrable(F) = edge has enough tokens to fire sink
• While (F )– if ((F-D))
• fire from F-D
– else• fire operator which increases number of tokens least
![Page 18: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/18.jpg)
Buffer Minimization
• Repeat– 1 A
– 2 B
– 4 C
• F={A}, D=– A
• F={B}, D=– B
• F={B,C},D={B}– C
• F={B,C},D={B}– C
• F={B}, D=– B
![Page 19: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/19.jpg)
SDFBDF
• What is SDF missing?– Restricts range of expression– Allows static scheduling
![Page 20: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/20.jpg)
SDFBDF
• Sufficient Addition:
![Page 21: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/21.jpg)
SDFBDF
• BDF– SDF + switch and select operators
• BDF is Turing Complete
![Page 22: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/22.jpg)
Expression: Block Diagram
Ptolemy example from Buck’94
![Page 23: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/23.jpg)
Expression: Stream Language
• Function AveragePairs(D: Signal returns Signal)– stream integer [(D[0]+D[1])/2] ||
AveragePairs(stream_rest(D))
Ex: Dennis94
![Page 24: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/24.jpg)
Convert to Static Data Flow
![Page 25: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/25.jpg)
Composition of Stream Operators
• Function Process(D:ImageStream, w:integer returns MarkStream)– let
• R:=for I in 1,w return array of– FourForThree(AveragePairsD[I]))
• end for
– in • PeakDetect(TwoDimFilter(R,w))
– end let
• end function
![Page 26: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/26.jpg)
Adapting
• How different?
![Page 27: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/27.jpg)
Adapting
• How different?– Expensive to change operators– Possibility of spatial pipelining of operators
• Operator AT
• Operator copies
– Allow dynamic rates…• violate fixed firing
![Page 28: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/28.jpg)
SDF: Timeslice
• Multiples of repetition/firing schedule– valid for acyclic graph– require greater buffering
![Page 29: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/29.jpg)
SDF: Spatial
• Can realize spatially
• Repetition/firing schedule – gives relative throughput rates– simple cases => suggest Area-Throughput
points
![Page 30: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/30.jpg)
Dynamic
• Note that adding switch/select gives general, dynamic dataflow
• Suggests can identify:– static regions (obey SDF restrictions)– dynamic boundaries (where dynamic operators exist)
• Static schedule static regions
• Dynamic control at boundary/invocation of static blocks
![Page 31: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/31.jpg)
Dynamic Flow Rates
• Cannot schedule completely at compile time
• Use feedback to get expected flow rate– schedule like SDF– track data presence at dynamic boundaries– allow additional buffer space (overflow)– stall slower operator as necessary
• careful check possible deadlock conditions
![Page 32: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing](https://reader030.vdocument.in/reader030/viewer/2022032704/56649d6a5503460f94a493fe/html5/thumbnails/32.jpg)
Summary
• Stream datatype captures computational structure – good for spatial implementations– expose parallelism
• Rich experience in DF/DSP to exploit
• Static powerful where applicable
• Can still help schedule “mostly static” cases