center for embedded computer systems university of california, irvine
DESCRIPTION
Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis. Sumit Gupta. Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu/~spark. Supported by Semiconductor Research Corporation. M e m o r y. Control. ALU. Data path. - PowerPoint PPT PresentationTRANSCRIPT
Center for Embedded Computer SystemsUniversity of California, Irvine
http://www.cecs.uci.edu/~spark
Coordinated Coarse Grain and Fine Grain Optimizations for High-Level Synthesis
Supported by Semiconductor Research Corporation
Sumit Gupta
High Level SynthesisHigh Level Synthesis
M e m o r y
ALUCon
trol
Data path
d = e - f g = h + i
If NodeT Fc
x = a + bc = a < b
j = d x gl = e + x
x = a + b;c = a < b;if (c) then d = e – f;else g = h + i;j = d x g;l = e + x;
Transform behavioral descriptions to RTL/gate level
From C to CDFG to Architecture
Our Approach to HLSOur Approach to HLS
Optimizing Compiler and Parallelizing Compiler transformations Optimizing Compiler and Parallelizing Compiler transformations applied at Source-level (Pre-synthesis) and during Scheduling applied at Source-level (Pre-synthesis) and during Scheduling Source-level code refinement using Pre-synthesis transformationsSource-level code refinement using Pre-synthesis transformations Code Restructuring by Speculative Code MotionsCode Restructuring by Speculative Code Motions Operation replication to improve concurrencyOperation replication to improve concurrency Transformations applied dynamically during scheduling to exploit Transformations applied dynamically during scheduling to exploit
new opportunities due to code motionsnew opportunities due to code motions Extract a high degree of parallelization using extensive Code Extract a high degree of parallelization using extensive Code
Transformations Transformations Improve Resource Utilization and increase Code CompactionImprove Resource Utilization and increase Code Compaction Reduce impact of programming style and control constructs on Reduce impact of programming style and control constructs on
HLS resultsHLS results Our approach is particularly suited to descriptions with nested Our approach is particularly suited to descriptions with nested
conditionals and loopsconditionals and loops
C Input VHDLOutput
Original CDFG
Optimized CDFG
Scheduling& Binding
Source-Level Compiler
Transformations
Scheduling Compiler
Transformations
Hierarchical Intermediate Hierarchical Intermediate RepresentationRepresentation We use We use Hierarchical Task GraphsHierarchical Task Graphs (HTGs) (HTGs)
Maintain structured view of design descriptionMaintain structured view of design description Consists of hierarchy of basic blocks and HTG nodesConsists of hierarchy of basic blocks and HTG nodes
3 Types of HTG Nodes:3 Types of HTG Nodes:
SingleSingle: No sub-nodes: No sub-nodes CompoundCompound: sub-nodes: sub-nodes LoopLoop: Encapsulate loops: Encapsulate loops
Augmented by data Augmented by data dependency graphsdependency graphs
Enable Coarse-Grain Enable Coarse-Grain transformationstransformations
TrailblazingTrailblazing: Hierarchical Code Motion : Hierarchical Code Motion TechniqueTechnique
Can move operations across large pieces of Can move operations across large pieces of code without visiting each node in betweencode without visiting each node in between
Speculative Code MotionsSpeculative Code Motions
+
+If Node
T FReverse
Speculation
Conditional Speculation
Speculation
Across HierarchicalBlocks
_
a
b
c
Operation Movement to reduce impact of Programming Style on Quality of HLS Results
Early Condition Execution
Evaluates conditionsAs soon as possible
Scheduling HeuristicScheduling Heuristic
BB 2 BB 3
BB 1
BB 6 BB 7
BB 5
BB 4
BB 8
+
+
+
Speculate
c
b
d
+ +a Get Available OpsGet Available Ops a, b, c, da, b, c, d
Determine Code Determine Code Motions RequiredMotions Required
Assign Cost to Assign Cost to each Operationeach Operation
Cost is based on Cost is based on data dependency data dependency chainchain
Schedule Op with Schedule Op with lowest Costlowest Cost
BB 0
BB 9
Speculate
Across HTG
BB 2 BB 3
BB 1
BB 6 BB 7
BB 5
BB 4
BB 8
+
+ c
b
+a BB 0
BB 9+ d
Scheduling HeuristicScheduling Heuristic
BB 2 BB 3
BB 1
BB 6 BB 7
BB 5
BB 4
BB 8
+
+
+
c
b
d
+ +a BB 0
BB 9
Speculate
Across HTG
Increasing the Scope of Code Increasing the Scope of Code MotionsMotions
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+ a
+ b_ c
_ dS0
S1
S2
S3
++Resource Allocation
Original Design
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
Scheduled Design
UnbalancedConditional
Insert New Scheduling Step in Insert New Scheduling Step in Shorter BranchShorter Branch
If NodeT F
_ e
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ dS0
S1
S2
++Resource Allocation
If NodeT F
BB 0
BB 2BB 1
BB 3
BB 4
+a
+b
_ c _ d
e_ _e
Common Sub-Expression Common Sub-Expression EliminationElimination
a = b + c;c = b < c;if (c) d = b + c;else e = g + h;
C Description
BB 2 BB 3
BB 1
d = b + c
BB 4
a = b + c
e = g + h
HTG Representation
If NodeT F
BB 0
BB 2 BB 3
BB 1
d = a
BB 4
a = b + c
e = g + h
After CSE
If NodeT F
BB 0
New Opportunities for New Opportunities for “Dynamic” CSE“Dynamic” CSE
Due to Speculative Code Due to Speculative Code MotionsMotions
BB 2 BB 3
BB 1
a = b + c
BB 6 BB 7
BB 5
d = b + c
BB 4
BB 8
Speculate
BB 2 BB 3
BB 1
a = dcse
BB 6 BB 7
BB 5
d = dcse
BB 4
BB 8
dcse = b + c BB 0BB 0
SPARSPARKK
High High Level Level
SynthesiSynthesis s
FramewoFrameworkrk
ExperimentationExperimentation Experiments for several transformationsExperiments for several transformations
Pre-synthesis transformations: loop invariant code Pre-synthesis transformations: loop invariant code motions, CSEmotions, CSE
Speculative Code MotionsSpeculative Code Motions Dynamic CSEDynamic CSE
We have used Spark to synthesize designs We have used Spark to synthesize designs derived from several industrial designsderived from several industrial designs MPEG-1, MPEG-2, GIMP Image Processing softwareMPEG-1, MPEG-2, GIMP Image Processing software
Scheduling ResultsScheduling Results Number of States in Number of States in
FSMFSM Cycles on Longest Path Cycles on Longest Path
through Designthrough Design
VHDL: Logic Synthesis VHDL: Logic Synthesis Critical Path Length Critical Path Length
(ns)(ns) Unit AreaUnit Area
Target ApplicationsTarget ApplicationsDesignDesign # of # of
IfsIfs# of # of
LoopsLoops# Non-# Non-Empty Empty Basic Basic BlocksBlocks
# of # of OperatiOperati
onsons
MPEG-1 MPEG-1 pred1pred1
44 22 1717 123123
MPEG-1 MPEG-1 pred2pred2
1111 66 4545 287287
MPEG-2 MPEG-2 dp_framdp_fram
ee
1818 44 6161 260260
GIMP GIMP tilertiler
1111 22 3535 150150
Code Motions: Logic Code Motions: Logic Synthesis ResultsSynthesis Results
MPEG Pred1 Function
Critical Path (cns)
Total Delay (c*lns)
Unit Area
d
MPEG Pred2 Function
0
0.2
0.4
0.6
0.8
1
1.2
Critical Path(c ns)
Total Delay(c*l ns)
Unit Area
Nor
mal
ized
Val
ues
Within Basic Blocks &Across Hierar. Blocks
+ Speculation
+ Reverse Speculation& Early Condition Execution
Condition Speculation
CSE/Dynamic CSE ResultsCSE/Dynamic CSE ResultsMPEG Pred2 Function
0
0.2
0.4
0.6
0.8
1
Critical Path(c ns)
Total Delay(c*l ns)
Unit Area
Nor
mal
ized
Val
ues
MPEG Pred1 Function
Critical Path (cns)
Total Delay (c*lns)
Unit Area
d
All Code Motions Enabled
+ Only CSE
+ Only Dynamic CSE
+ CSE & Dynamic CSE
ConclusionsConclusions Parallelizing code transformations enable a new range of Parallelizing code transformations enable a new range of
HLS transformationsHLS transformations Can provide the needed improvement in quality of HLS results Can provide the needed improvement in quality of HLS results
for them to be competitive against manually designed circuits. for them to be competitive against manually designed circuits. Synthesis approach can dominate SOC embedded systems Synthesis approach can dominate SOC embedded systems
design design Can enable productivity improvements in microelectronic designCan enable productivity improvements in microelectronic design
Built a synthesis system with a range of code Built a synthesis system with a range of code transformationstransformations Platform for applying Coarse and Fine-grain OptimizationsPlatform for applying Coarse and Fine-grain Optimizations Code transformations address complex control flowCode transformations address complex control flow Tool-box approach where transformations and heuristics can be Tool-box approach where transformations and heuristics can be
developeddeveloped Enables finding the right synthesis script for different application domainsEnables finding the right synthesis script for different application domains
Performance improvements of 60-70 % across a number of Performance improvements of 60-70 % across a number of designsdesigns
We have also shown its effectiveness on an Intel designWe have also shown its effectiveness on an Intel design
PublicationsPublications Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-Dynamic Conditional Branch Balancing during the High-Level Synthesis of Control-
Intensive Designs Intensive Designs S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, To appear in DATEDATE, March 2003 , March 2003 SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler SPARK : A High-Level Synthesis Framework For Applying Parallelizing Compiler
TransformationsTransformations S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N.D. Dutt, R.K. Gupta, A. Nicolau, VLSI DesignVLSI Design 2003 2003 Best Paper AwardBest Paper Award
Dynamic Common Sub-Expression Elimination during Scheduling in High-Level Dynamic Common Sub-Expression Elimination during Scheduling in High-Level SynthesisSynthesis S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, M. Reshadi, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSSISSS 2002 2002
Coordinated Transformations for High-Level Synthesis of High Performance Coordinated Transformations for High-Level Synthesis of High Performance Microprocessor BlocksMicroprocessor Blocks S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. S. Gupta, T. Kam, M. Kishinevsky, S. Rotem, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, Nicolau, DACDAC 2002 2002
Conditional Speculation and its Effects on Performance and Area for High-Level Conditional Speculation and its Effects on Performance and Area for High-Level SynthesisSynthesis S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N. Savoiu, N.D. Dutt, R.K. Gupta, A. Nicolau, ISSSISSS 2001 2001
Speculation Techniques for High Level synthesis of Control Intensive DesignsSpeculation Techniques for High Level synthesis of Control Intensive Designs S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, S. Gupta, N. Savoiu, S. Kim, N.D. Dutt, R.K. Gupta, A. Nicolau, DACDAC 2001 2001
Analysis of High-level Address Code Transformations for Programmable ProcessorsAnalysis of High-level Address Code Transformations for Programmable Processors S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, S. Gupta, M. Miranda, F. Catthoor, R. K. Gupta, DATEDATE 2000 2000
Synthesis of Testable RTL Designs using Adaptive Simulated Annealing AlgorithmSynthesis of Testable RTL Designs using Adaptive Simulated Annealing Algorithm C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on C.P. Ravikumar, S. Gupta, A. Jajoo, Intl. Conf. on VLSI DesignVLSI Design, 1998 , 1998 Best Student Best Student Paper AwardPaper Award
Book ChapterBook Chapter ASIC DesignASIC Design, S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by , S. Gupta, R. K. Gupta, Chapter 64, The VLSI Handbook, Edited by
Wai-Kai ChenWai-Kai Chen