spark: a parallelizing high-level synthesis...
TRANSCRIPT
![Page 1: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/1.jpg)
Center for Embedded Computer SystemsUniversity of California, Irvine and San Diego
http://www.cecs.uci.edu/~spark
SPARK: A Parallelizing High-Level Synthesis Framework
Supported by Semiconductor Research Corporation & Intel Inc
Sumit Gupta
Rajesh Gupta, Nikil Dutt, Alex Nicolau
![Page 2: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/2.jpg)
2Copyright Sumit Gupta 2003
System Level SynthesisSystem Level Synthesis
System LevelModel
TaskAnalysis
HW/SWPartitioning
ASIC
ProcessorCore
Memory
FPGA
I/O
HardwareBehavioralDescription
SoftwareBehavioralDescription
SoftwareCompiler
HighLevel
Synthesis
![Page 3: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/3.jpg)
3Copyright Sumit Gupta 2003
High Level SynthesisHigh Level Synthesis
M e m o r y
ALUCon
trol
Data path
d = e - f g = h + i
If NodeT Fc
x = a + bc = a < b
j = d x gl = e + x
x = a + b;c = a < b;if (c) thend = e – f;
elseg = h + i;
j = d x g;l = e + x;
Transform behavioral descriptions to RTL/gate level
From C to CDFG to Architecture
![Page 4: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/4.jpg)
4Copyright Sumit Gupta 2003
High Level SynthesisHigh Level Synthesis
M e m o r y
ALUCon
trol
Data path
d = e - f g = h + i
If NodeT Fc
x = a + bc = a < b
j = d x gl = e + x
x = a + b;c = a < b;if (c) thend = e – f;
elseg = h + i;
j = d x g;l = e + x;
Transform behavioral descriptions to RTL/gate level
From C to CDFG to Architecture
Problem # 1Problem # 1 :: Poor quality of HLS results beyond Poor quality of HLS results beyond straightstraight--line behavioral descriptionsline behavioral descriptionsPoor/No controllability of the HLSPoor/No controllability of the HLSresultsresults
Problem # 2Problem # 2 ::
![Page 5: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/5.jpg)
5Copyright Sumit Gupta 2003
OutlineOutlinenn Motivation and Background Motivation and Background nn Our Approach to Our Approach to ParallelizingParallelizing HighHigh--Level SynthesisLevel Synthesisnn Code Transformations Techniques for PHLSCode Transformations Techniques for PHLS
nn Parallelizing Transformations Parallelizing Transformations nn Dynamic TransformationsDynamic Transformations
nn The PHLS Framework and Experimental ResultsThe PHLS Framework and Experimental Resultsnn Multimedia and Image Processing ApplicationsMultimedia and Image Processing Applicationsnn Case Study: Intel Instruction Length DecoderCase Study: Intel Instruction Length Decoder
nn Conclusions and Future WorkConclusions and Future Work
![Page 6: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/6.jpg)
6Copyright Sumit Gupta 2003
HighHigh--level Synthesislevel Synthesisnn WellWell--researched area: from early 1980’sresearched area: from early 1980’s
nn Renewed interest due to new system level design methodologies Renewed interest due to new system level design methodologies nn Large number of synthesis optimizations have been proposed Large number of synthesis optimizations have been proposed
nn Either Either operation leveloperation level: algebraic transformations on DSP codes: algebraic transformations on DSP codesnn or or logic levellogic level: Don’t Care based control optimizations: Don’t Care based control optimizationsnn In contrast, compiler transformations operate at both operation In contrast, compiler transformations operate at both operation level level
(fine(fine--grain) and source level (coarsegrain) and source level (coarse--grain) grain) nn Parallelizing Compiler TransformationsParallelizing Compiler Transformations
nn Different optimization objectives and cost models than HLSDifferent optimization objectives and cost models than HLSØØOur aimOur aim: Develop Synthesis and Parallelizing Compiler : Develop Synthesis and Parallelizing Compiler
Transformations that are “useful” for HLS Transformations that are “useful” for HLS nn Beyond scheduling results: in Beyond scheduling results: in Circuit Area and DelayCircuit Area and Delaynn For large designs with For large designs with complex control flowcomplex control flow (nested (nested
conditionals/loops)conditionals/loops)
![Page 7: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/7.jpg)
7Copyright Sumit Gupta 2003
Our Approach: Our Approach: ParallelizingParallelizing HLS (PHLS)HLS (PHLS)
C Input VHDLOutput
Original CDFG
Optimized CDFG
Scheduling& Binding
Source-Level CompilerTransformations
Scheduling Compiler & Dynamic Transformations
nn Optimizing Compiler and Parallelizing Compiler transformations Optimizing Compiler and Parallelizing Compiler transformations applied at applied at SourceSource--levellevel (Pre(Pre--synthesis) and during synthesis) and during SchedulingSchedulingnn SourceSource--level code refinement using level code refinement using PrePre--synthesissynthesis transformationstransformationsnn Code Restructuring by Code Restructuring by SpeculativeSpeculative Code MotionsCode Motionsnn Operation Operation replicationreplication to improve concurrencyto improve concurrencynn DynamicDynamic transformations: transformations: exploit new opportunities during exploit new opportunities during
schedulingscheduling
![Page 8: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/8.jpg)
8Copyright Sumit Gupta 2003
PHLS Transformations PHLS Transformations Organized into Four GroupsOrganized into Four Groups
1.1. PrePre--synthesissynthesis: Loop: Loop--invariant code motions, Loop invariant code motions, Loop unrolling, CSEunrolling, CSE
2.2. SchedulingScheduling: Speculative Code Motions, Multi: Speculative Code Motions, Multi--cycling, Operation Chaining, Loop Pipeliningcycling, Operation Chaining, Loop Pipelining
3.3. DynamicDynamic: Transformations applied dynamically : Transformations applied dynamically during scheduling: Dynamic CSE, Dynamic Copy during scheduling: Dynamic CSE, Dynamic Copy Propagation, Dynamic Branch BalancingPropagation, Dynamic Branch Balancing
4.4. Basic Compiler TransformationsBasic Compiler Transformations: Copy : Copy Propagation, Dead Code EliminationPropagation, Dead Code Elimination
![Page 9: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/9.jpg)
9Copyright Sumit Gupta 2003
Speculative Code MotionsSpeculative Code Motions
+
+If Node
T FReverse
Speculation
Conditional Speculation
Speculation
Across HierarchicalBlocks
_
a
b
c
Operation Movement to reduce impact of Programming Style on Quality of HLS Results
Early Condition Execution
Evaluates conditionsAs soon as possible
![Page 10: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/10.jpg)
10Copyright Sumit Gupta 2003
Dynamic TransformationsDynamic Transformationsnn Called Called “dynamic”“dynamic” since they are applied during since they are applied during
scheduling (versus a pass before/after scheduling)scheduling (versus a pass before/after scheduling)nn Dynamic Branch BalancingDynamic Branch Balancingnn Increase the scope of code motionsIncrease the scope of code motionsnn Reduce impact of programming style on HLS resultsReduce impact of programming style on HLS results
nn Dynamic CSE and Dynamic Copy PropagationDynamic CSE and Dynamic Copy Propagationnn Exploit the Operation movement and duplication due Exploit the Operation movement and duplication due
to speculative code motionsto speculative code motionsnn Create new opportunities to apply these transformations Create new opportunities to apply these transformations
nn Reduce the number of operations Reduce the number of operations
![Page 11: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/11.jpg)
11Copyright Sumit Gupta 2003
Dynamic Branch BalancingDynamic Branch Balancing
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
Original Design
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
Scheduled Design
UnbalancedConditional
Longest PathLongest Path
![Page 12: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/12.jpg)
12Copyright Sumit Gupta 2003
Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
Original Design Scheduled Design
![Page 13: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/13.jpg)
13Copyright Sumit Gupta 2003
Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch
If NodeT F
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
e_ _e
Original Design Scheduled Design
Dynamic Branch Balancing inserts new scheduling stepsDynamic Branch Balancing inserts new scheduling stepsnn Enables Conditional SpeculationEnables Conditional Speculationnn Leads to further code compactionLeads to further code compaction
![Page 14: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/14.jpg)
14Copyright Sumit Gupta 2003
Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE
a = b + c;cd = b < c;if (cd)
d = b + c;else
e = g + h;
C Description
BB 2 BB 3
BB 1
d = b + c
BB 4
a = b + c
e = g + h
HTG Representation
If NodeT F
BB 0
BB 2 BB 3
BB 1
d = a
BB 4
a = b + c
e = g + h
After Traditional CSE
If NodeT F
BB 0
![Page 15: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/15.jpg)
15Copyright Sumit Gupta 2003
a = b + c;cd = b < c;if (cd)
d = b + c;else
e = g + h;
C Description
BB 2 BB 3
BB 1
d = b + c
BB 4
a = b + c
e = g + h
HTG Representation
If NodeT F
BB 0
BB 2 BB 3
BB 1
d = a
BB 4
a = b + c
e = g + h
After Traditional CSE
If NodeT F
BB 0
nn We use notion of We use notion of DominanceDominance of Basic Blocksof Basic Blocksnn Basic block Basic block BBiBBi dominates dominates BBjBBj if all control paths from if all control paths from
the the initialinitial basic block of the design graph leading to basic block of the design graph leading to BBjBBj goes through goes through BBiBBi
nn We can eliminate an operation We can eliminate an operation opjopj in in BBjBBj using common using common expression in expression in opiopi if if BBiBBi dominates dominates BBjBBj
Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE
![Page 16: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/16.jpg)
16Copyright Sumit Gupta 2003
New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions
BB 2 BB 3
BB 1
a = b + c
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
Scheduler decides to Speculate
BB 2 BB 3
BB 1
a = dcse
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
dcse = b + c BB 0BB 0
CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6
CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6
![Page 17: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/17.jpg)
17Copyright Sumit Gupta 2003
BB 2 BB 3
BB 1
a = b + c
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
BB 2 BB 3
BB 1
a = dcse
BB 6 BB 7
BB 5
d = dcse
BB 4
BB 8
dcse = b + c BB 0BB 0Scheduler decides to Speculate
New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions
CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6
CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6
If scheduler moves or duplicates an operation op, apply CSE on remaining operations using op
If scheduler moves or duplicates an operation If scheduler moves or duplicates an operation opop, apply CSE on , apply CSE on remaining operations using remaining operations using opop
![Page 18: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/18.jpg)
18Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = b + c BB 8BB 8
Scheduler decides to
ConditionallySpeculate
![Page 19: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/19.jpg)
19Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = a'BB 8 BB 8
Scheduler decides to
ConditionallySpeculate
![Page 20: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/20.jpg)
20Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = a'BB 8 BB 8
Scheduler decides to
ConditionallySpeculate
nn Use the notion of dominance Use the notion of dominance by groups of basic blocksby groups of basic blocksn All Control Paths leading up
to BB8 come from either BB1 or BB2: =>=> BB1 and BB1 and BB2 BB2 together dominatetogether dominate BB8BB8
![Page 21: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/21.jpg)
21Copyright Sumit Gupta 2003
Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
a + c_
a + c_
LoopLoopShiftingShifting
a + c_
![Page 22: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/22.jpg)
22Copyright Sumit Gupta 2003
Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique
BB 0
a +
b
c_
+_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
a +
c_
a + c_
LoopLoopShiftingShifting
CompacCompac--tiontion
![Page 23: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/23.jpg)
23Copyright Sumit Gupta 2003
SPARKSPARKHigh Level High Level Synthesis Synthesis
FrameworkFramework
![Page 24: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/24.jpg)
24Copyright Sumit Gupta 2003
SPARK Parallelizing HLS FrameworkSPARK Parallelizing HLS Frameworknn C input and C input and SynthesizableSynthesizable RTL VHDL outputRTL VHDL outputnn ToolTool--box box of Transformations and Heuristicsof Transformations and Heuristics
nn Each of these can be developed independently of the otherEach of these can be developed independently of the othernn Script based Script based control over transformations & heuristicscontrol over transformations & heuristicsnn Hierarchical Intermediate Representation (Hierarchical Intermediate Representation (HTGsHTGs))
nn Retains structural information about design (conditional blocks,Retains structural information about design (conditional blocks,loops)loops)
nn Enables efficient and structured application of transformationsEnables efficient and structured application of transformationsnn Complete HLS tool:Complete HLS tool: Does Binding, Control Synthesis and Does Binding, Control Synthesis and
Backend VHDL generationBackend VHDL generationnn Interconnect Minimizing Resource BindingInterconnect Minimizing Resource Binding
nn Enables Enables Graphical VisualizationGraphical Visualization of Design description of Design description and intermediate resultsand intermediate results
nn 100,000+ lines of C++ code100,000+ lines of C++ code
![Page 25: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/25.jpg)
25Copyright Sumit Gupta 2003
Synthesizable CSynthesizable Cnn ANSIANSI--C front end from Edison Design Group (EDG)C front end from Edison Design Group (EDG)nn Features of C not supported for synthesisFeatures of C not supported for synthesis
nn PointersPointersnn However, Arrays and passing by reference However, Arrays and passing by reference areare supportedsupported
nn Recursive Function CallsRecursive Function Callsnn GotosGotos
nn Features for which support has not been implementedFeatures for which support has not been implementednn MultiMulti--dimensional arraysdimensional arraysnn StructsStructsnn Continue, BreaksContinue, Breaks
nn Hardware component generated for each function Hardware component generated for each function nn A called function is instantiated as a hardware component in A called function is instantiated as a hardware component in
calling functioncalling function
![Page 26: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/26.jpg)
26Copyright Sumit Gupta 2003
HTGHTG DFGDFGGraph VisualizationGraph Visualization
![Page 27: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/27.jpg)
27Copyright Sumit Gupta 2003
Resource Utilization GraphResource Utilization Graph
SchedulingScheduling
![Page 28: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/28.jpg)
28Copyright Sumit Gupta 2003
Example of Example of ComplexComplex HTGHTGnn Example of a real design: Example of a real design:
MPEGMPEG--1 pred2 function1 pred2 functionnn Just for demonstration; you are Just for demonstration; you are
not expected to read the textnot expected to read the text
nn Multiple nested loops and Multiple nested loops and conditionalsconditionals
![Page 29: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/29.jpg)
29Copyright Sumit Gupta 2003
ExperimentsExperimentsnn Results presented here forResults presented here for
nn PrePre--synthesis transformationssynthesis transformationsnn Speculative Code MotionsSpeculative Code Motionsnn Dynamic CSEDynamic CSE
nn We used We used SPARKSPARK to synthesize designs derived from to synthesize designs derived from several industrial designsseveral industrial designsnn MPEGMPEG--1, MPEG1, MPEG--2, GIMP Image Processing software2, GIMP Image Processing softwarenn Case StudyCase Study of Intel Instruction Length Decoderof Intel Instruction Length Decoder
nn Scheduling ResultsScheduling Resultsnn Number of States in FSMNumber of States in FSMnn Cycles on Longest Path Cycles on Longest Path
through Designthrough Design
nn VHDL: Logic Synthesis VHDL: Logic Synthesis nn Critical Path Length (ns)Critical Path Length (ns)nn Unit AreaUnit Area
![Page 30: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/30.jpg)
30Copyright Sumit Gupta 2003
Target ApplicationsTarget Applications
1501503535221111GIMP GIMP tilertiler
2602606161441818MPEGMPEG--2 2 dp_framedp_frame
2872874545661111MPEGMPEG--1 1 pred2pred2
12312317172244MPEGMPEG--1 1 pred1pred1
# of # of OperationsOperations
# Non# Non--Empty Empty Basic BlocksBasic Blocks
# of # of LoopsLoops
# of Ifs# of IfsDesignDesign
![Page 31: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/31.jpg)
31Copyright Sumit Gupta 2003
MPEG-1 Pred1 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
MPEG-1 Pred2 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results
Non-speculative CMs: Within BBs & Across Hier Blocks
42%
10%
36%
36%
8%
39%
![Page 32: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/32.jpg)
32Copyright Sumit Gupta 2003
MPEG-1 Pred1 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
MPEG-1 Pred2 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results
Non-speculative CMs: Within BBs & Across Hier Blocks
42%
10%
36%
36%
8%
39%
Overall: 63Overall: 63--66 % improvement in Delay66 % improvement in Delay
Almost constant Area Almost constant Area
![Page 33: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/33.jpg)
33Copyright Sumit Gupta 2003
Non-speculative CMs: Within BBs & Across Hier Blocks
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
GIMP Tiler Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
14%
20%1%
33%
41%
52%
![Page 34: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/34.jpg)
34Copyright Sumit Gupta 2003
Non-speculative CMs: Within BBs & Across Hier Blocks
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
GIMP Tiler Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
14%
20%1%
33%
41%
52%
Overall: 48Overall: 48--76 % improvement in Delay76 % improvement in Delay
Almost constant Area Almost constant Area
![Page 35: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/35.jpg)
35Copyright Sumit Gupta 2003
Case Study: Case Study: IntelIntel Instruction Length DecoderInstruction Length Decoder
Stream ofInstructions
Instruction Length Decoder
FirstInsn
SecondInsn
ThirdInstruction
Instruction BufferInstruction Buffer
![Page 36: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/36.jpg)
36Copyright Sumit Gupta 2003
Example Design: ILD Block from IntelExample Design: ILD Block from Intel
nn Case Study: A design derived from the Case Study: A design derived from the Instruction Length Instruction Length DecoderDecoder of the Intel Pentiumof the Intel Pentium®® class of processorsclass of processorsnn Decodes length of instructions streaming from Decodes length of instructions streaming from
memorymemorynnHas to look at up to 4 bytes at a timeHas to look at up to 4 bytes at a time
nn Has to execute in Has to execute in one cycleone cycle and decode about 64 bytes and decode about 64 bytes of instructionsof instructions
ØØ Characteristics of Microprocessor functional blocksCharacteristics of Microprocessor functional blocksnn Low Latency: Single or Dual cycle implementationLow Latency: Single or Dual cycle implementationnn Consist of several small computationsConsist of several small computationsnn Intermix of control and data logicIntermix of control and data logic
![Page 37: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/37.jpg)
37Copyright Sumit Gupta 2003
Basic Instruction Length Decoder:Basic Instruction Length Decoder:Initial DescriptionInitial Description
Length Contribution 1
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
=+
+
+
Total Length Of Instruction
Length Contribution 2
Length Contribution 3
Length Contribution 4
![Page 38: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/38.jpg)
38Copyright Sumit Gupta 2003
Instruction Length Decoder: Instruction Length Decoder: Decoding 2Decoding 2ndnd InstructionInstruction
Length Contribution 1
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 3
Byt
e 4
=+
+
+
Total Length Of Insn
Length Contribution 2
Length Contribution 3
Length Contribution 4B
yte
5
Byt
e 6First
Insn
After decoding the length of an instruction
v Start looking from next bytev Again examine up to 4 bytes to determine length of next instruction
![Page 39: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/39.jpg)
39Copyright Sumit Gupta 2003
Instruction Length Decoder:Instruction Length Decoder:ParallelizedParallelized DescriptionDescription
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
Length Contribution 1
Length Contribution 2
Length Contribution 3
Length Contribution 4
=+
+
+
Total Length Of Instruction
v Speculatively calculate the length contribution of all 4 bytes at a timev Determine actual total length of instruction based on this data
![Page 40: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/40.jpg)
40Copyright Sumit Gupta 2003
ILD: Extracting Further ParallelismILD: Extracting Further ParallelismB
yte
1
Byt
e 2
Byt
e 3
Byt
e 4
Byte 1Insn.Len Calc
Byte 3Insn.Len Calc
Byte 5Insn.Len Calc
Byte 2Insn.Len Calc
Byte 4Insn.Len Calc
Byt
e 5
v Speculativelycalculate length of instructions assuming a new instruction starts at each bytev Do this calculation for all bytes in parallelv Traverse from 1st
byte to last v Determine length of instructions starting from the 1st till the lastv Discard unused calculations
![Page 41: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/41.jpg)
41Copyright Sumit Gupta 2003
Initial:Initial: MultiMulti--Cycle Cycle SequentialSequential ArchitectureArchitecture
Length Contribution 1
Need Byte 4 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
Length Contribution 2
Length Contribution 3
Length Contribution 4
Need Byte 2 ?
![Page 42: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/42.jpg)
42Copyright Sumit Gupta 2003
ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,
Fully Unroll Loop,Eliminate Loop Index
Variable
![Page 43: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/43.jpg)
43Copyright Sumit Gupta 2003
ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,
Fully Unroll Loop,Eliminate Loop Index
Variable
Multi-cycle Sequential
Architecture
Multi-cycle Sequential
Architecture
Single cycle Parallel
Architecture
Single cycle Parallel
Architecture
nn Our toolbox approach enables us to develop a script to Our toolbox approach enables us to develop a script to synthesize applications from different domainssynthesize applications from different domains
nn Final design looks close to the actual implementation done Final design looks close to the actual implementation done by Intelby Intel
![Page 44: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/44.jpg)
44Copyright Sumit Gupta 2003
ConclusionsConclusionsnn Parallelizing code transformations enable a new range of Parallelizing code transformations enable a new range of
HLS transformationsHLS transformationsnn Provide the needed improvement in quality of HLS results Provide the needed improvement in quality of HLS results
nn Possible to be competitive against manually designed circuits. Possible to be competitive against manually designed circuits. nn Can enable productivity improvements in microelectronic designCan enable productivity improvements in microelectronic design
nn Built a synthesis system with a range of code transformationsBuilt a synthesis system with a range of code transformationsnn Platform for applying Coarse and FinePlatform for applying Coarse and Fine--grain Optimizationsgrain Optimizationsnn ToolTool--box approach where transformations and heuristics can be box approach where transformations and heuristics can be
developeddevelopednn Enables the designer to find the right Enables the designer to find the right synthesis scriptsynthesis script for for
different application domainsdifferent application domainsnn Performance improvements of 60Performance improvements of 60--70 % across a number of designs70 % across a number of designsnn We have shown its effectiveness on an Intel designWe have shown its effectiveness on an Intel design
![Page 45: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/45.jpg)
45Copyright Sumit Gupta 2003
AcknowledgementsAcknowledgementsnn AdvisorsAdvisorsnn Professors Rajesh Gupta, Professors Rajesh Gupta, NikilNikil DuttDutt, Alex , Alex NicolauNicolau
nn Contributors to SPARK frameworkContributors to SPARK frameworknn Nick Nick SavoiuSavoiu, , MehrdadMehrdad ReshadiReshadi, , SunwooSunwoo KimKim
nn Intel Strategic CAD Labs (SCL)Intel Strategic CAD Labs (SCL)nn Timothy Timothy KamKam, Mike , Mike KishinevskyKishinevsky
nn Supported by Semiconductor Research Supported by Semiconductor Research Corporation and Intel SCLCorporation and Intel SCL
![Page 46: SPARK: A Parallelizing High-Level Synthesis Frameworkembedded.eecs.berkeley.edu/esd-seminar/fall03/... · Center for Embedded Computer Systems University of California, Irvine and](https://reader033.vdocument.in/reader033/viewer/2022042406/5f2070689b8d9a51e7692303/html5/thumbnails/46.jpg)
Thank YouThank You