ece 667 - synthesis & verification - lecture 5 1 ece 697b (667) spring 2006 ece 697b (667)...
TRANSCRIPT
1
ECE 667 - Synthesis & Verification - Lecture 5
ECE 697B (667)ECE 697B (667)Spring 2006Spring 2006
Synthesis and Verificationof Digital Circuits
SchedulingSchedulingConstructive AlgorithmsConstructive Algorithms
ECE 667 - Synthesis & Verification - Lecture 5 2
Scheduling – a Combinatorial Optimization ProblemScheduling – a Combinatorial Optimization Problem
• NP-complete ProblemNP-complete Problem• Optimal solutions for special cases and ILPOptimal solutions for special cases and ILP• Heuristics - iterative Improvements Heuristics - iterative Improvements • Heuristics – constructiveHeuristics – constructive• Various versions of the problemVarious versions of the problem
• Unconstrained minimum latencyUnconstrained minimum latency• Resource-constrained minimum latencyResource-constrained minimum latency• Timing constrained minimum latencyTiming constrained minimum latency• Latency-constrained minimum Latency-constrained minimum
• If all resources are identical, problem is reduced to If all resources are identical, problem is reduced to multiprocessor scheduling (Hu’s algorithm)multiprocessor scheduling (Hu’s algorithm)
• Minimum latency multiprocessor problem is intractableMinimum latency multiprocessor problem is intractable
ECE 667 - Synthesis & Verification - Lecture 5 3
Scheduling - Iterative ImprovementScheduling - Iterative Improvement
• Kernighan - Lin (deterministic)Kernighan - Lin (deterministic)• Simulated AnnealingSimulated Annealing• Lottery Iterative ImprovementLottery Iterative Improvement• Neural NetworksNeural Networks• Genetic AlgorithmsGenetic Algorithms• Taboo SearchTaboo Search
ECE 667 - Synthesis & Verification - Lecture 5 4
Scheduling - Constructive TechniquesScheduling - Constructive Techniques
• Most ConstrainedMost Constrained
• Least ConstrainingLeast Constraining
ECE 667 - Synthesis & Verification - Lecture 5 5
Force Directed SchedulingForce Directed Scheduling
• Goal is to reduce hardware by balancing concurrencyGoal is to reduce hardware by balancing concurrency• Iterative algorithm, one operation scheduled per Iterative algorithm, one operation scheduled per
iterationiteration• Information (i.e. speed & area) fed back into Information (i.e. speed & area) fed back into
schedulerscheduler
ECE 667 - Synthesis & Verification - Lecture 5 6
The Force Directed Scheduling AlgorithmThe Force Directed Scheduling Algorithm
ECE 667 - Synthesis & Verification - Lecture 5 7
Step 1Step 1
• Determine ASAP and ALAP schedulesDetermine ASAP and ALAP schedules
*
-+
**
*+ <
**-
*
-
+* * *+ <**
-
ASAP ALAP
ECE 667 - Synthesis & Verification - Lecture 5 8
Step 2Step 2
• Determine Determine Time FrameTime Frame of each op of each op– Length of box ~ Possible execution cyclesLength of box ~ Possible execution cycles
– Width of box ~ Probability of assignmentWidth of box ~ Probability of assignment– Uniform distribution, Area assigned = 1Uniform distribution, Area assigned = 1
C-step 1
C-step 2
C-step 3
C-step 4
Time Frames
*
-
*
*
-
*
**
+ <
+
1/2
1/3
ECE 667 - Synthesis & Verification - Lecture 5 9
Step 3Step 3
• Create Create Distribution GraphsDistribution Graphs– Sum of probabilities of each Op typeSum of probabilities of each Op type– Indicates concurrency of similar OpsIndicates concurrency of similar Ops
DG(i) = DG(i) = Prob(Op, i) Prob(Op, i)
DG for Multiply DG for Add, Sub, Comp
ECE 667 - Synthesis & Verification - Lecture 5 10
Diff Eq Example: Precedence Graph RecalledDiff Eq Example: Precedence Graph Recalled
ECE 667 - Synthesis & Verification - Lecture 5 11
Diff Eq Example: Diff Eq Example: Time Frame & Probability CalculationTime Frame & Probability Calculation
ECE 667 - Synthesis & Verification - Lecture 5 12
Diff Eq Example: DG CalculationDiff Eq Example: DG Calculation
ECE 667 - Synthesis & Verification - Lecture 5 13
Conditional StatementsConditional Statements
• Operations in different branches are mutually exclusiveOperations in different branches are mutually exclusive
• Operations of same type can be overlapped onto DGOperations of same type can be overlapped onto DG
• Probability of most likely operation is added to DGProbability of most likely operation is added to DG
DG for Add
-+
-+
+Fork
Join
+-+
-+
ECE 667 - Synthesis & Verification - Lecture 5 14
Self ForcesSelf Forces
Scheduling an operation will effect overall concurrency Every operation has 'self force' for every C-step of its time frame Analogous to the effect of a spring: f = K x
Desirable scheduling will have negative self force Will achieve better concurrency (lower potential energy)
Force(i) = DG(i) * x(i)
DG(i) ~ Current Distribution Graph value
x(i) ~ Change in operation’s probability
Self Force(j) = [Force(i)]
b
ti
ECE 667 - Synthesis & Verification - Lecture 5 15
ExampleExample
Attempt to schedule multiply in C-step 1
Self Force(1) = Force(1) + Force(2)
= ( DG(1) * X(1) ) + ( DG(2) * X(2) )
= [2.833*(0.5) + 2.333 * (-0.5)] = +0.25
This is positive, scheduling the multiply in the
first C-step would be bad
DG for Multiply
*
-
*
*
-
*
**
+ <
+
C-step 1
C-step 2
C-step 3
C-step 41/2
1/3
ECE 667 - Synthesis & Verification - Lecture 5 16
Diff Eq Example: Self Force for Node 4Diff Eq Example: Self Force for Node 4
ECE 667 - Synthesis & Verification - Lecture 5 17
Predecessor & Successor ForcesPredecessor & Successor Forces
• Scheduling an operation may affect the time frames of Scheduling an operation may affect the time frames of other linked operationsother linked operations
• This may negate the benefits of the desired assignmentThis may negate the benefits of the desired assignment• Predecessor/Successor Forces = Sum of Self Forces of Predecessor/Successor Forces = Sum of Self Forces of
any implicitly scheduled operationsany implicitly scheduled operations
*
-+
**
*+ <
**-
ECE 667 - Synthesis & Verification - Lecture 5 18
Diff Eq Example: Successor Force on Node 4Diff Eq Example: Successor Force on Node 4
• If node 4 scheduled in step 1If node 4 scheduled in step 1– no effect on time frame for successor node 8no effect on time frame for successor node 8
• Total force = Froce4(1) = +0.25Total force = Froce4(1) = +0.25
• If node 4 scheduled in step 2If node 4 scheduled in step 2– causes node 8 to be scheduled into step 3causes node 8 to be scheduled into step 3– must calculate successor forcemust calculate successor force
ECE 667 - Synthesis & Verification - Lecture 5 19
Diff Eq Example: Diff Eq Example: Final Time Frame and ScheduleFinal Time Frame and Schedule
ECE 667 - Synthesis & Verification - Lecture 5 20
Diff Eq Example: Final DGDiff Eq Example: Final DG
ECE 667 - Synthesis & Verification - Lecture 5 21
LookaheadLookahead
• Temporarily modify the constant DG(i) to include the effect of the Temporarily modify the constant DG(i) to include the effect of the iteration being considerediteration being considered
Force (i) = temp_DG(i) * x(i)Force (i) = temp_DG(i) * x(i)temp_DG(i) = DG(i) + x(i)/3temp_DG(i) = DG(i) + x(i)/3
• Consider previous example:Consider previous example:
Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2)Self Force(1) = (DG(1) + x(1)/3)x(1) + (DG(2) + x(2)/3)x(2) = .5(2.833 + .5/3) -.5(2.333 - .5/3)= .5(2.833 + .5/3) -.5(2.333 - .5/3) = +.41667 = +.41667
• This is even worse than beforeThis is even worse than before
ECE 667 - Synthesis & Verification - Lecture 5 22
Minimization of Bus CostsMinimization of Bus Costs
• Basic algorithm suitable for narrow class of problemsBasic algorithm suitable for narrow class of problems• Algorithm can be refined to consider “cost” factorsAlgorithm can be refined to consider “cost” factors• Number of buses ~ number of concurrent data transfersNumber of buses ~ number of concurrent data transfers• Number of buses = maximum # transfers in any C-stepNumber of buses = maximum # transfers in any C-step• Create modified DG to include transfers: Transfer DGCreate modified DG to include transfers: Transfer DG
Trans DG(i) = [Prob (op,i) * Opn_No_InOuts]Trans DG(i) = [Prob (op,i) * Opn_No_InOuts]
Opn_No_InOuts ~ combined distinct in/outputs for OpOpn_No_InOuts ~ combined distinct in/outputs for Op
• Calculate Force with this DG and add to Self ForceCalculate Force with this DG and add to Self Force
ECE 667 - Synthesis & Verification - Lecture 5 23
Minimization of Register CostsMinimization of Register Costs
• Minimum no. registers required is given by the largest Minimum no. registers required is given by the largest number of data arcs crossing a C-step boundarynumber of data arcs crossing a C-step boundary
• Create Create Storage OperationsStorage Operations, at output of any operation , at output of any operation that transfers a value to a destination in a later C-step that transfers a value to a destination in a later C-step
• Generate Generate Storage DGStorage DG for these “operations” for these “operations”• Length of storage operation depends on final scheduleLength of storage operation depends on final schedule
s
ss
d
d d
Storage distribution for S
ASAP Lifetime MAX Lifetime ALAP Lifetime
ECE 667 - Synthesis & Verification - Lecture 5 24
Minimization of Register Costs ( contd.)Minimization of Register Costs ( contd.)
• avg life] =avg life] =
• storage DG(i) = (no overlap between ASAP & ALAP)storage DG(i) = (no overlap between ASAP & ALAP)
• storage DG(i) = (if overlap)storage DG(i) = (if overlap)
• Calculate and add “Storage” Force to Self ForceCalculate and add “Storage” Force to Self Force
3
life] [MAX life] [ALAP life] [ASAP
life][max
life] [avg
[overlap]life][max
[overlap] - life] [avg
7 registers minimum
ASAP Force Directed
5 registers minimum
ECE 667 - Synthesis & Verification - Lecture 5 25
PipeliningPipelining
* * ****
+
+<
--
* * ****
+
+<
--
DG for Multiply
123, 1’4, 2’ 3’ 4’
Instance
Instance’
Functional Pipelining
1
2
34
*
*
Structural Pipelining
• Functional PipeliningFunctional Pipelining– Pipelining across multiple operationsPipelining across multiple operations– Must balance distribution across Must balance distribution across
groups of concurrent C-stepsgroups of concurrent C-steps– Cut DG horizontally and superimposeCut DG horizontally and superimpose– Finally perform regular Force Directed Finally perform regular Force Directed
SchedulingScheduling• Structural PipeliningStructural Pipelining
– Pipelining within an operationPipelining within an operation– For non data-dependant operations, For non data-dependant operations,
only the first C-step need be only the first C-step need be consideredconsidered
ECE 667 - Synthesis & Verification - Lecture 5 26
Other OptimizationsOther Optimizations
• Local timing constraintsLocal timing constraints– Insert dummy timing operations -> Restricted time framesInsert dummy timing operations -> Restricted time frames
• Multiclass FU’sMulticlass FU’s– Create multiclass DG by summing probabilities of relevant opsCreate multiclass DG by summing probabilities of relevant ops
• Multistep/Chained operations.Multistep/Chained operations.– Carry propagation delay information with operationCarry propagation delay information with operation– Extend time frames into other C-steps as requiredExtend time frames into other C-steps as required
• Hardware constraintsHardware constraints– Use Force as priority function in list scheduling algorithmsUse Force as priority function in list scheduling algorithms
ECE 667 - Synthesis & Verification - Lecture 5 27
Scheduling using Simulated AnnealingScheduling using Simulated Annealing
Reference:Reference:Devadas, S.; Newton, A.R.Devadas, S.; Newton, A.R.
Algorithms for hardware allocation in data path synthesisAlgorithms for hardware allocation in data path synthesis..
IEEE Transactions on Computer-Aided Design of Integrated IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, July 1989, Vol.8, (no.7):768-81. Circuits and Systems, July 1989, Vol.8, (no.7):768-81.
ECE 667 - Synthesis & Verification - Lecture 5 28
Simulated AnnealingSimulated Annealing
Local Search
Solution space
Cos
t fu
nctio
n
?
ECE 667 - Synthesis & Verification - Lecture 5 29
Statistical Mechanics Combinatorial OptimizationStatistical Mechanics Combinatorial Optimization
State {r:} (configuration -- a set of atomic position )State {r:} (configuration -- a set of atomic position )
weight eweight e-E({r:])/K -E({r:])/K BBTT -- Boltzmann distribution-- Boltzmann distribution
E({r:]): energy of configurationE({r:]): energy of configuration
KKBB: Boltzmann constant: Boltzmann constant
T: temperatureT: temperature
Low temperature limit ??Low temperature limit ??
ECE 667 - Synthesis & Verification - Lecture 5 30
AnalogyAnalogy
Physical System
State (configuration)
Energy
Ground State
Rapid Quenching
Careful Annealing
Optimization Problem
Solution
Cost Function
Optimal Solution
Iteration Improvement
Simulated Annealing
ECE 667 - Synthesis & Verification - Lecture 5 31
Generic Simulated Annealing AlgorithmGeneric Simulated Annealing Algorithm
1. Get an initial solution S2. Get an initial temperature T > 03. While not yet 'frozen' do the following: 3.1 For 1 i L, do the following:
3.1.1 Pick a random neighbor S'of S 3.1.2 Let =cost(S') - cost(S) 3.1.3 If 0 (downhill move) set S = S' 3.1.4 If >0 (uphill move)
set S=S' with probability e-/T
3.2 Set T = rT (reduce temperature)4. Return S
ECE 667 - Synthesis & Verification - Lecture 5 32
Basic Ingredients for S.A.Basic Ingredients for S.A.
• Solution SpaceSolution Space
• Neighborhood StructureNeighborhood Structure
• Cost FunctionCost Function
• Annealing ScheduleAnnealing Schedule
ECE 667 - Synthesis & Verification - Lecture 5 33
ObservationObservation
• All scheduling algorithms we have discussed so far All scheduling algorithms we have discussed so far are critical path schedulersare critical path schedulers
• They can only generate schedules for iteration period They can only generate schedules for iteration period larger than or equal to the critical pathlarger than or equal to the critical path
• They only exploit concurrency within a single They only exploit concurrency within a single iteration, and only utilize the intra-iteration iteration, and only utilize the intra-iteration precedence constraintsprecedence constraints
ECE 667 - Synthesis & Verification - Lecture 5 34
ExampleExample
• Can one do better than iteration period of 4?Can one do better than iteration period of 4?– Pipelining + retiming can reduce critical path to 3, and also the # Pipelining + retiming can reduce critical path to 3, and also the #
of functional unitsof functional units
• ApproachesApproaches– Transformations followed by schedulingTransformations followed by scheduling– Transformations integrated with schedulingTransformations integrated with scheduling