Center for Embedded Computer SystemsUniversity of California, Irvine and San Diego
http://www.cecs.uci.edu/~spark
SPARK: A Parallelizing High-Level Synthesis Framework
Supported by Semiconductor Research Corporation & Intel Inc
Sumit Gupta
Rajesh Gupta, Nikil Dutt, Alex Nicolau
2Copyright Sumit Gupta 2003
System Level SynthesisSystem Level Synthesis
System LevelModel
TaskAnalysis
HW/SWPartitioning
ASIC
ProcessorCore
Memory
FPGA
I/O
HardwareBehavioralDescription
SoftwareBehavioralDescription
SoftwareCompiler
HighLevel
Synthesis
3Copyright Sumit Gupta 2003
High Level SynthesisHigh Level Synthesis
M e m o r y
ALUCon
trol
Data path
d = e - f g = h + i
If NodeT Fc
x = a + bc = a < b
j = d x gl = e + x
x = a + b;c = a < b;if (c) thend = e – f;
elseg = h + i;
j = d x g;l = e + x;
Transform behavioral descriptions to RTL/gate level
From C to CDFG to Architecture
4Copyright Sumit Gupta 2003
High Level SynthesisHigh Level Synthesis
M e m o r y
ALUCon
trol
Data path
d = e - f g = h + i
If NodeT Fc
x = a + bc = a < b
j = d x gl = e + x
x = a + b;c = a < b;if (c) thend = e – f;
elseg = h + i;
j = d x g;l = e + x;
Transform behavioral descriptions to RTL/gate level
From C to CDFG to Architecture
Problem # 1Problem # 1 :: Poor quality of HLS results beyond Poor quality of HLS results beyond straightstraight--line behavioral descriptionsline behavioral descriptionsPoor/No controllability of the HLSPoor/No controllability of the HLSresultsresults
Problem # 2Problem # 2 ::
5Copyright Sumit Gupta 2003
OutlineOutlinenn Motivation and Background Motivation and Background nn Our Approach to Our Approach to ParallelizingParallelizing HighHigh--Level SynthesisLevel Synthesisnn Code Transformations Techniques for PHLSCode Transformations Techniques for PHLS
nn Parallelizing Transformations Parallelizing Transformations nn Dynamic TransformationsDynamic Transformations
nn The PHLS Framework and Experimental ResultsThe PHLS Framework and Experimental Resultsnn Multimedia and Image Processing ApplicationsMultimedia and Image Processing Applicationsnn Case Study: Intel Instruction Length DecoderCase Study: Intel Instruction Length Decoder
nn Conclusions and Future WorkConclusions and Future Work
6Copyright Sumit Gupta 2003
HighHigh--level Synthesislevel Synthesisnn WellWell--researched area: from early 1980’sresearched area: from early 1980’s
nn Renewed interest due to new system level design methodologies Renewed interest due to new system level design methodologies nn Large number of synthesis optimizations have been proposed Large number of synthesis optimizations have been proposed
nn Either Either operation leveloperation level: algebraic transformations on DSP codes: algebraic transformations on DSP codesnn or or logic levellogic level: Don’t Care based control optimizations: Don’t Care based control optimizationsnn In contrast, compiler transformations operate at both operation In contrast, compiler transformations operate at both operation level level
(fine(fine--grain) and source level (coarsegrain) and source level (coarse--grain) grain) nn Parallelizing Compiler TransformationsParallelizing Compiler Transformations
nn Different optimization objectives and cost models than HLSDifferent optimization objectives and cost models than HLSØØOur aimOur aim: Develop Synthesis and Parallelizing Compiler : Develop Synthesis and Parallelizing Compiler
Transformations that are “useful” for HLS Transformations that are “useful” for HLS nn Beyond scheduling results: in Beyond scheduling results: in Circuit Area and DelayCircuit Area and Delaynn For large designs with For large designs with complex control flowcomplex control flow (nested (nested
conditionals/loops)conditionals/loops)
7Copyright Sumit Gupta 2003
Our Approach: Our Approach: ParallelizingParallelizing HLS (PHLS)HLS (PHLS)
C Input VHDLOutput
Original CDFG
Optimized CDFG
Scheduling& Binding
Source-Level CompilerTransformations
Scheduling Compiler & Dynamic Transformations
nn Optimizing Compiler and Parallelizing Compiler transformations Optimizing Compiler and Parallelizing Compiler transformations applied at applied at SourceSource--levellevel (Pre(Pre--synthesis) and during synthesis) and during SchedulingSchedulingnn SourceSource--level code refinement using level code refinement using PrePre--synthesissynthesis transformationstransformationsnn Code Restructuring by Code Restructuring by SpeculativeSpeculative Code MotionsCode Motionsnn Operation Operation replicationreplication to improve concurrencyto improve concurrencynn DynamicDynamic transformations: transformations: exploit new opportunities during exploit new opportunities during
schedulingscheduling
8Copyright Sumit Gupta 2003
PHLS Transformations PHLS Transformations Organized into Four GroupsOrganized into Four Groups
1.1. PrePre--synthesissynthesis: Loop: Loop--invariant code motions, Loop invariant code motions, Loop unrolling, CSEunrolling, CSE
2.2. SchedulingScheduling: Speculative Code Motions, Multi: Speculative Code Motions, Multi--cycling, Operation Chaining, Loop Pipeliningcycling, Operation Chaining, Loop Pipelining
3.3. DynamicDynamic: Transformations applied dynamically : Transformations applied dynamically during scheduling: Dynamic CSE, Dynamic Copy during scheduling: Dynamic CSE, Dynamic Copy Propagation, Dynamic Branch BalancingPropagation, Dynamic Branch Balancing
4.4. Basic Compiler TransformationsBasic Compiler Transformations: Copy : Copy Propagation, Dead Code EliminationPropagation, Dead Code Elimination
9Copyright Sumit Gupta 2003
Speculative Code MotionsSpeculative Code Motions
+
+If Node
T FReverse
Speculation
Conditional Speculation
Speculation
Across HierarchicalBlocks
_
a
b
c
Operation Movement to reduce impact of Programming Style on Quality of HLS Results
Early Condition Execution
Evaluates conditionsAs soon as possible
10Copyright Sumit Gupta 2003
Dynamic TransformationsDynamic Transformationsnn Called Called “dynamic”“dynamic” since they are applied during since they are applied during
scheduling (versus a pass before/after scheduling)scheduling (versus a pass before/after scheduling)nn Dynamic Branch BalancingDynamic Branch Balancingnn Increase the scope of code motionsIncrease the scope of code motionsnn Reduce impact of programming style on HLS resultsReduce impact of programming style on HLS results
nn Dynamic CSE and Dynamic Copy PropagationDynamic CSE and Dynamic Copy Propagationnn Exploit the Operation movement and duplication due Exploit the Operation movement and duplication due
to speculative code motionsto speculative code motionsnn Create new opportunities to apply these transformations Create new opportunities to apply these transformations
nn Reduce the number of operations Reduce the number of operations
11Copyright Sumit Gupta 2003
Dynamic Branch BalancingDynamic Branch Balancing
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
Original Design
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
Scheduled Design
UnbalancedConditional
Longest PathLongest Path
12Copyright Sumit Gupta 2003
Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
Original Design Scheduled Design
13Copyright Sumit Gupta 2003
Insert New Scheduling Step in Shorter BranchInsert New Scheduling Step in Shorter Branch
If NodeT F
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b
_ c
_ dS0
S1
S2
S3
++Resource Allocation
e_ _e
Original Design Scheduled Design
Dynamic Branch Balancing inserts new scheduling stepsDynamic Branch Balancing inserts new scheduling stepsnn Enables Conditional SpeculationEnables Conditional Speculationnn Leads to further code compactionLeads to further code compaction
14Copyright Sumit Gupta 2003
Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE
a = b + c;cd = b < c;if (cd)
d = b + c;else
e = g + h;
C Description
BB 2 BB 3
BB 1
d = b + c
BB 4
a = b + c
e = g + h
HTG Representation
If NodeT F
BB 0
BB 2 BB 3
BB 1
d = a
BB 4
a = b + c
e = g + h
After Traditional CSE
If NodeT F
BB 0
15Copyright Sumit Gupta 2003
a = b + c;cd = b < c;if (cd)
d = b + c;else
e = g + h;
C Description
BB 2 BB 3
BB 1
d = b + c
BB 4
a = b + c
e = g + h
HTG Representation
If NodeT F
BB 0
BB 2 BB 3
BB 1
d = a
BB 4
a = b + c
e = g + h
After Traditional CSE
If NodeT F
BB 0
nn We use notion of We use notion of DominanceDominance of Basic Blocksof Basic Blocksnn Basic block Basic block BBiBBi dominates dominates BBjBBj if all control paths from if all control paths from
the the initialinitial basic block of the design graph leading to basic block of the design graph leading to BBjBBj goes through goes through BBiBBi
nn We can eliminate an operation We can eliminate an operation opjopj in in BBjBBj using common using common expression in expression in opiopi if if BBiBBi dominates dominates BBjBBj
Dynamic CSEDynamic CSE: Going beyond Traditional CSE: Going beyond Traditional CSE
16Copyright Sumit Gupta 2003
New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions
BB 2 BB 3
BB 1
a = b + c
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
Scheduler decides to Speculate
BB 2 BB 3
BB 1
a = dcse
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
dcse = b + c BB 0BB 0
CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6
CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6
17Copyright Sumit Gupta 2003
BB 2 BB 3
BB 1
a = b + c
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
BB 2 BB 3
BB 1
a = dcse
BB 6 BB 7
BB 5
d = dcse
BB 4
BB 8
dcse = b + c BB 0BB 0Scheduler decides to Speculate
New Opportunities for “New Opportunities for “DynamicDynamic” CSE” CSEDue to Code MotionsDue to Code Motions
CSE CSE not possiblenot possible since BB2 since BB2 does not dominate BB6does not dominate BB6
CSE CSE possiblepossible now since now since BB0 does not dominate BB6BB0 does not dominate BB6
If scheduler moves or duplicates an operation op, apply CSE on remaining operations using op
If scheduler moves or duplicates an operation If scheduler moves or duplicates an operation opop, apply CSE on , apply CSE on remaining operations using remaining operations using opop
18Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = b + c BB 8BB 8
Scheduler decides to
ConditionallySpeculate
19Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = a'BB 8 BB 8
Scheduler decides to
ConditionallySpeculate
20Copyright Sumit Gupta 2003
Condition Speculation & Dynamic CSECondition Speculation & Dynamic CSE
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = b + c
BB 3
BB 7
d = b + c
BB 1 BB 2
BB 0
BB 5 BB 6
BB 4
a = a'
BB 3
BB 7
a' = b + c a' = b + c
d = a'BB 8 BB 8
Scheduler decides to
ConditionallySpeculate
nn Use the notion of dominance Use the notion of dominance by groups of basic blocksby groups of basic blocksn All Control Paths leading up
to BB8 come from either BB1 or BB2: =>=> BB1 and BB1 and BB2 BB2 together dominatetogether dominate BB8BB8
21Copyright Sumit Gupta 2003
Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
a + c_
a + c_
LoopLoopShiftingShifting
a + c_
22Copyright Sumit Gupta 2003
Loop Shifting: An Incremental Loop Loop Shifting: An Incremental Loop Pipelining TechniquePipelining Technique
BB 0
a +
b
c_
+_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
BB 0
b +_ d
LoopExit
Loop Node
BB 3
BB 2
BB 1
BB 4
a +
c_
a + c_
LoopLoopShiftingShifting
CompacCompac--tiontion
23Copyright Sumit Gupta 2003
SPARKSPARKHigh Level High Level Synthesis Synthesis
FrameworkFramework
24Copyright Sumit Gupta 2003
SPARK Parallelizing HLS FrameworkSPARK Parallelizing HLS Frameworknn C input and C input and SynthesizableSynthesizable RTL VHDL outputRTL VHDL outputnn ToolTool--box box of Transformations and Heuristicsof Transformations and Heuristics
nn Each of these can be developed independently of the otherEach of these can be developed independently of the othernn Script based Script based control over transformations & heuristicscontrol over transformations & heuristicsnn Hierarchical Intermediate Representation (Hierarchical Intermediate Representation (HTGsHTGs))
nn Retains structural information about design (conditional blocks,Retains structural information about design (conditional blocks,loops)loops)
nn Enables efficient and structured application of transformationsEnables efficient and structured application of transformationsnn Complete HLS tool:Complete HLS tool: Does Binding, Control Synthesis and Does Binding, Control Synthesis and
Backend VHDL generationBackend VHDL generationnn Interconnect Minimizing Resource BindingInterconnect Minimizing Resource Binding
nn Enables Enables Graphical VisualizationGraphical Visualization of Design description of Design description and intermediate resultsand intermediate results
nn 100,000+ lines of C++ code100,000+ lines of C++ code
25Copyright Sumit Gupta 2003
Synthesizable CSynthesizable Cnn ANSIANSI--C front end from Edison Design Group (EDG)C front end from Edison Design Group (EDG)nn Features of C not supported for synthesisFeatures of C not supported for synthesis
nn PointersPointersnn However, Arrays and passing by reference However, Arrays and passing by reference areare supportedsupported
nn Recursive Function CallsRecursive Function Callsnn GotosGotos
nn Features for which support has not been implementedFeatures for which support has not been implementednn MultiMulti--dimensional arraysdimensional arraysnn StructsStructsnn Continue, BreaksContinue, Breaks
nn Hardware component generated for each function Hardware component generated for each function nn A called function is instantiated as a hardware component in A called function is instantiated as a hardware component in
calling functioncalling function
26Copyright Sumit Gupta 2003
HTGHTG DFGDFGGraph VisualizationGraph Visualization
27Copyright Sumit Gupta 2003
Resource Utilization GraphResource Utilization Graph
SchedulingScheduling
28Copyright Sumit Gupta 2003
Example of Example of ComplexComplex HTGHTGnn Example of a real design: Example of a real design:
MPEGMPEG--1 pred2 function1 pred2 functionnn Just for demonstration; you are Just for demonstration; you are
not expected to read the textnot expected to read the text
nn Multiple nested loops and Multiple nested loops and conditionalsconditionals
29Copyright Sumit Gupta 2003
ExperimentsExperimentsnn Results presented here forResults presented here for
nn PrePre--synthesis transformationssynthesis transformationsnn Speculative Code MotionsSpeculative Code Motionsnn Dynamic CSEDynamic CSE
nn We used We used SPARKSPARK to synthesize designs derived from to synthesize designs derived from several industrial designsseveral industrial designsnn MPEGMPEG--1, MPEG1, MPEG--2, GIMP Image Processing software2, GIMP Image Processing softwarenn Case StudyCase Study of Intel Instruction Length Decoderof Intel Instruction Length Decoder
nn Scheduling ResultsScheduling Resultsnn Number of States in FSMNumber of States in FSMnn Cycles on Longest Path Cycles on Longest Path
through Designthrough Design
nn VHDL: Logic Synthesis VHDL: Logic Synthesis nn Critical Path Length (ns)Critical Path Length (ns)nn Unit AreaUnit Area
30Copyright Sumit Gupta 2003
Target ApplicationsTarget Applications
1501503535221111GIMP GIMP tilertiler
2602606161441818MPEGMPEG--2 2 dp_framedp_frame
2872874545661111MPEGMPEG--1 1 pred2pred2
12312317172244MPEGMPEG--1 1 pred1pred1
# of # of OperationsOperations
# Non# Non--Empty Empty Basic BlocksBasic Blocks
# of # of LoopsLoops
# of Ifs# of IfsDesignDesign
31Copyright Sumit Gupta 2003
MPEG-1 Pred1 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
MPEG-1 Pred2 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results
Non-speculative CMs: Within BBs & Across Hier Blocks
42%
10%
36%
36%
8%
39%
32Copyright Sumit Gupta 2003
MPEG-1 Pred1 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
MPEG-1 Pred2 Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis Results
Non-speculative CMs: Within BBs & Across Hier Blocks
42%
10%
36%
36%
8%
39%
Overall: 63Overall: 63--66 % improvement in Delay66 % improvement in Delay
Almost constant Area Almost constant Area
33Copyright Sumit Gupta 2003
Non-speculative CMs: Within BBs & Across Hier Blocks
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
GIMP Tiler Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
14%
20%1%
33%
41%
52%
34Copyright Sumit Gupta 2003
Non-speculative CMs: Within BBs & Across Hier Blocks
+ Speculative Code Motions
+ Pre-Synthesis Transforms
+ Dynamic CSE
Scheduling & Logic Synthesis ResultsScheduling & Logic Synthesis ResultsMPEG-2 DpFrame Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
GIMP Tiler Function
0
0.2
0.4
0.6
0.8
1
1.2
Longest Path(lcyc)
Critical Path(cns)
Total Delay (c*l) Unit Area
14%
20%1%
33%
41%
52%
Overall: 48Overall: 48--76 % improvement in Delay76 % improvement in Delay
Almost constant Area Almost constant Area
35Copyright Sumit Gupta 2003
Case Study: Case Study: IntelIntel Instruction Length DecoderInstruction Length Decoder
Stream ofInstructions
Instruction Length Decoder
FirstInsn
SecondInsn
ThirdInstruction
Instruction BufferInstruction Buffer
36Copyright Sumit Gupta 2003
Example Design: ILD Block from IntelExample Design: ILD Block from Intel
nn Case Study: A design derived from the Case Study: A design derived from the Instruction Length Instruction Length DecoderDecoder of the Intel Pentiumof the Intel Pentium®® class of processorsclass of processorsnn Decodes length of instructions streaming from Decodes length of instructions streaming from
memorymemorynnHas to look at up to 4 bytes at a timeHas to look at up to 4 bytes at a time
nn Has to execute in Has to execute in one cycleone cycle and decode about 64 bytes and decode about 64 bytes of instructionsof instructions
ØØ Characteristics of Microprocessor functional blocksCharacteristics of Microprocessor functional blocksnn Low Latency: Single or Dual cycle implementationLow Latency: Single or Dual cycle implementationnn Consist of several small computationsConsist of several small computationsnn Intermix of control and data logicIntermix of control and data logic
37Copyright Sumit Gupta 2003
Basic Instruction Length Decoder:Basic Instruction Length Decoder:Initial DescriptionInitial Description
Length Contribution 1
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
=+
+
+
Total Length Of Instruction
Length Contribution 2
Length Contribution 3
Length Contribution 4
38Copyright Sumit Gupta 2003
Instruction Length Decoder: Instruction Length Decoder: Decoding 2Decoding 2ndnd InstructionInstruction
Length Contribution 1
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 3
Byt
e 4
=+
+
+
Total Length Of Insn
Length Contribution 2
Length Contribution 3
Length Contribution 4B
yte
5
Byt
e 6First
Insn
After decoding the length of an instruction
v Start looking from next bytev Again examine up to 4 bytes to determine length of next instruction
39Copyright Sumit Gupta 2003
Instruction Length Decoder:Instruction Length Decoder:ParallelizedParallelized DescriptionDescription
Need Byte 4 ?
Need Byte 2 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
Length Contribution 1
Length Contribution 2
Length Contribution 3
Length Contribution 4
=+
+
+
Total Length Of Instruction
v Speculatively calculate the length contribution of all 4 bytes at a timev Determine actual total length of instruction based on this data
40Copyright Sumit Gupta 2003
ILD: Extracting Further ParallelismILD: Extracting Further ParallelismB
yte
1
Byt
e 2
Byt
e 3
Byt
e 4
Byte 1Insn.Len Calc
Byte 3Insn.Len Calc
Byte 5Insn.Len Calc
Byte 2Insn.Len Calc
Byte 4Insn.Len Calc
Byt
e 5
v Speculativelycalculate length of instructions assuming a new instruction starts at each bytev Do this calculation for all bytes in parallelv Traverse from 1st
byte to last v Determine length of instructions starting from the 1st till the lastv Discard unused calculations
41Copyright Sumit Gupta 2003
Initial:Initial: MultiMulti--Cycle Cycle SequentialSequential ArchitectureArchitecture
Length Contribution 1
Need Byte 4 ?
Need Byte 3 ?
Byt
e 1
Byt
e 2
Byt
e 3
Byt
e 4
Length Contribution 2
Length Contribution 3
Length Contribution 4
Need Byte 2 ?
42Copyright Sumit Gupta 2003
ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,
Fully Unroll Loop,Eliminate Loop Index
Variable
43Copyright Sumit Gupta 2003
ILD Synthesis: Resulting ArchitectureILD Synthesis: Resulting ArchitectureSpeculate Operations,
Fully Unroll Loop,Eliminate Loop Index
Variable
Multi-cycle Sequential
Architecture
Multi-cycle Sequential
Architecture
Single cycle Parallel
Architecture
Single cycle Parallel
Architecture
nn Our toolbox approach enables us to develop a script to Our toolbox approach enables us to develop a script to synthesize applications from different domainssynthesize applications from different domains
nn Final design looks close to the actual implementation done Final design looks close to the actual implementation done by Intelby Intel
44Copyright Sumit Gupta 2003
ConclusionsConclusionsnn Parallelizing code transformations enable a new range of Parallelizing code transformations enable a new range of
HLS transformationsHLS transformationsnn Provide the needed improvement in quality of HLS results Provide the needed improvement in quality of HLS results
nn Possible to be competitive against manually designed circuits. Possible to be competitive against manually designed circuits. nn Can enable productivity improvements in microelectronic designCan enable productivity improvements in microelectronic design
nn Built a synthesis system with a range of code transformationsBuilt a synthesis system with a range of code transformationsnn Platform for applying Coarse and FinePlatform for applying Coarse and Fine--grain Optimizationsgrain Optimizationsnn ToolTool--box approach where transformations and heuristics can be box approach where transformations and heuristics can be
developeddevelopednn Enables the designer to find the right Enables the designer to find the right synthesis scriptsynthesis script for for
different application domainsdifferent application domainsnn Performance improvements of 60Performance improvements of 60--70 % across a number of designs70 % across a number of designsnn We have shown its effectiveness on an Intel designWe have shown its effectiveness on an Intel design
45Copyright Sumit Gupta 2003
AcknowledgementsAcknowledgementsnn AdvisorsAdvisorsnn Professors Rajesh Gupta, Professors Rajesh Gupta, NikilNikil DuttDutt, Alex , Alex NicolauNicolau
nn Contributors to SPARK frameworkContributors to SPARK frameworknn Nick Nick SavoiuSavoiu, , MehrdadMehrdad ReshadiReshadi, , SunwooSunwoo KimKim
nn Intel Strategic CAD Labs (SCL)Intel Strategic CAD Labs (SCL)nn Timothy Timothy KamKam, Mike , Mike KishinevskyKishinevsky
nn Supported by Semiconductor Research Supported by Semiconductor Research Corporation and Intel SCLCorporation and Intel SCL
Thank YouThank You