ilp optimum scheduling, module register allocation...

17
VLSI Design 1995, Vol. 3, No. 1, pp. 21-36 Reprints available directly from the publisher Photocopying permitted by license only (C) 1995 OPA (Overseas Publishers Association) Amsterdam B.V. Published under license by Gordon and Breach Science Publishers SA Printed in Malaysia An ILP Solution for Optimum Scheduling, Module and Register Allocation, and Operation Binding in Datapath Synthesis T.C. WILSON VLSI-CAD Group, Department of Computing and Information Science, University of Guelph, Guelph, Ontario, CANADA N1G 2Wl N. MUKHERJEE Department of Electrical Engineering, McGill University, Montreal, Quebec, CANADA H3A 2A7 M.K. GARG IBM Canada Laboratories, Don Mills, Ontario, CANADA M3C 1H7 D.K. BANERJVf VLSI-CAD Group, Department of Computing and Information Science, University of Guelph, Guelph, Ontario, CANADA N1G 2Wl (Received June 18, 1993, Revised November 9, 1993) We present an integrated and optimal solution to the problems of operator scheduling, module and register allocation, and operator binding in datapath synthesis. The solution is based on an integer linear programming (ILP) model that minimizes a weighted sum of module area and total execution time under very general assumptions of module capabilities. In particular, a module may execute an arbitrary combination of operations, possibly using different numbers of control steps for different operations. Furthermore, operations may be implemented by a variety of modules, possibly requiring different numbers of control steps depending on the modules chosen. This generality in the complexity and mixture of modules is unqiue to our system and leads to an optimum selection of modules to meet specified design constraints. Significant extensions include the ability to incorporate pipelined functional units and operator chaining in an integrated manner. Straightforward extension to multi-block synthesis is discussed briefly but the details are omitted due to space considerations. Key Words: Datapath; Synthesis; Scheduling; Allocation; Binding; Registers 1. INTRODUCTION module allocation, and binding of op- erators are three major tasks in behavioral-level data’path synthesis. Scheduling involves the assign- ment of operations to control steps; module alloca- tion selects the set of functional units that will participate in the design; and operation binding as- sociates these operations with particular instances of the functional units. This association consists of two sub-tasks: (i) type binding, which concerns the choice *Please send all correspondence to the corresponding author. of module type for each operation; and (ii) instance binding, which identifies the specific module of the designated type. In this paper, we present an optimum and inte- grated solution to the problems of scheduling, allo- cation, and binding. The solution utilizes an integer linear program (ILP) that minimizes a weighted sum of module area and total execution time under very general assumptions of module capability. In partic- ular, a module may execute an arbitrary combina- tion of operations, possibly using different numbers of control steps for different operations. Further- more, the same operation may be implemented by a variety of modules, possibly involving different num- 21

Upload: others

Post on 13-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

VLSI Design1995, Vol. 3, No. 1, pp. 21-36Reprints available directly from the publisherPhotocopying permitted by license only

(C) 1995 OPA (Overseas Publishers Association)Amsterdam B.V. Published under license byGordon and Breach Science Publishers SA

Printed in Malaysia

An ILP Solution for Optimum Scheduling, Moduleand Register Allocation, and Operation

Binding in Datapath Synthesis

T.C. WILSONVLSI-CAD Group, Department of Computing and Information Science, University of Guelph,

Guelph, Ontario, CANADA N1G 2Wl

N. MUKHERJEEDepartment of Electrical Engineering, McGill University, Montreal, Quebec, CANADA H3A 2A7

M.K. GARGIBM Canada Laboratories, Don Mills, Ontario, CANADA M3C 1H7

D.K. BANERJVfVLSI-CAD Group, Department of Computing and Information Science, University of Guelph,

Guelph, Ontario, CANADA N1G 2Wl

(Received June 18, 1993, Revised November 9, 1993)

We present an integrated and optimal solution to the problems of operator scheduling, module and registerallocation, and operator binding in datapath synthesis. The solution is based on an integer linear programming(ILP) model that minimizes a weighted sum of module area and total execution time under very generalassumptions of module capabilities. In particular, a module may execute an arbitrary combination of operations,possibly using different numbers of control steps for different operations. Furthermore, operations may beimplemented by a variety of modules, possibly requiring different numbers of control steps depending on themodules chosen. This generality in the complexity and mixture of modules is unqiue to our system and leads to anoptimum selection of modules to meet specified design constraints. Significant extensions include the ability toincorporate pipelined functional units and operator chaining in an integrated manner. Straightforward extensionto multi-block synthesis is discussed briefly but the details are omitted due to space considerations.

Key Words: Datapath; Synthesis; Scheduling; Allocation; Binding; Registers

1. INTRODUCTION

module allocation, and binding of op-erators are three major tasks in behavioral-level

data’path synthesis. Scheduling involves the assign-ment of operations to control steps; module alloca-tion selects the set of functional units that willparticipate in the design; and operation binding as-sociates these operations with particular instances ofthe functional units. This association consists of twosub-tasks: (i) type binding, which concerns the choice

*Please send all correspondence to the corresponding author.

of module type for each operation; and (ii) instancebinding, which identifies the specific module of thedesignated type.

In this paper, we present an optimum and inte-grated solution to the problems of scheduling, allo-cation, and binding. The solution utilizes an integerlinear program (ILP) that minimizes a weighted sumof module area and total execution time under verygeneral assumptions of module capability. In partic-ular, a module may execute an arbitrary combina-tion of operations, possibly using different numbersof control steps for different operations. Further-more, the same operation may be implemented by avariety of modules, possibly involving different num-

21

Page 2: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

22 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

bers of control steps. Important extensions to thiswork include the ability to handle multiple-blockdesigns, the use of pipelined functional units, andthe possibility of operator chaining. Chaining refersto executing a sequence of operations on the samecontrol step using different functional units withoutintervening storage. We will show how to incorpo-rate these possibilities within the ILP, not as spe-cially handled cases but as alternative module op-tions and strategic opportunities.

Like functional units, registers can consume sig-nificant amounts of space on a chip. An importantextension to our basic model incorporates registerallocation in an integrated manner. This allows aschedule and module allocation to be found thatdoes not require more than a predetermined num-ber of registers. Alternatively, the number of regis-ters can be determined by the ILP. In this case,register area and module area can be traded off, sothat an optimum configuration is found within anoverall area limit.

2. BACKGROUND

The problems of scheduling, allocation, and modulebinding have received considerable attention, andmost recent works have stressed the need for anintegrated approach. There is also considerable ef-fort being applied to handling a more general mix ofmodule types. Most early efforts, however, assumelittle or no flexibility in module capability and treatscheduling and module allocation as independentproblems, where type binding, module allocation,scheduling, and instance binding are performed insome linear order, often the one just listed. Amongmany papers in this category, we note the ILPmodel presented in [1], which considers only theproblem of scheduling, but includes pipelined unitsand operation chaining.

Recently, several papers have presented an "in-tegrated" approach to the problems of scheduling,allocation, and binding. They are integrated in thesense that they consider scheduling, allocation, andbinding issues together at each decision point. Butthe decisions are sequential and are based onknowledge of the past and perhaps on estimates ofwhat may follow. They are constructive heuristicswhich decide on one operation (or step) at a timeand may not yield optimal designs in the end. Forexample, Cloutier and Thomas [2] extend the force-directed scheduling approach to also include typeand instance binding. The binding allows an opera-

tion to be associated with any of a variety of moduletypes. Ramachandran and Gajski [3] use a similarapproach, but permit a more general module library.They consider different implementations of simplemodules but do not allow arbitrary multi-functionmodules. Balakrishnan and Marwedel [4] develop adesign one step at a time and use a linear programto design each step.

Papachristou and Konuk [5] suggest an iterativerefinement procedure which alternates between al-location and scheduling. They use force-directedscheduling, followed by an ILP to allocate a mini-mum area set of modules. The allocation may beused to constrain another round of scheduling, fol-lowed by a new allocation, etc. They consider onlyoperations that execute in one control step.Some approaches are global in the sense of at-

tacking the scheduling and allocation problems forall operations at the same time. Devadas andNewton [6] use simulated-annealing-based algo-rithms to explore the solution space. Papachristouand Nourani [7] use "moveframe scheduling" andLiapunov’s stability function to solve the combinedproblem. Although their algorithm can handlemulti-function units, they cannot accomodate dif-ferent physical implementations of a functional unitor complex multi-function units taking differenttimes for different operations.

Gebotys and Elmasry [8] is the only other ILPformulation that performs scheduling, allocation,and binding globally and optimally. However, theyrequire the execution time of each operation to be aconstant. This restricts their module library, andcurtails possibililities for dynamically selecting dif-ferent implementations of an operation. Their algo-rithm is "valid for straight line code" [8], and hasnot been extended to handle multiple block designs.An interesting innovation is to exploit the feature of"facets" which reduces the time required to find aninteger solution. They also consider register area,but do not cope directly with values which are usedby multiple unrelated operations. They have re-cently [9] included the notion of control step lengthin their model. The CATHEDRAL silicon compiler[13] considers many issues in synthesis: operatorscheduling, resource allocation, interconnect mini-mization, memory management, address generation,and instruction word minimization. In contrrast, thiswork only focuses on a subset of these issues, namelyoperator scheduling, module and register allocation,and module binding. Interconnect optimization, in-cluding module re-binding is handled separately inour synthesis system.

Page 3: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 23

3. BASIC CONCEPTS

Given a library of available functional unit types andthe data flow graph (DFG), the problem is to deter-mine a schedule and choose a suitable combinationof units, so that:

the operations are performed by the unitshaving the required capability,

a module can execute at most one operation ata time,

the operations are executed in the order speci-fied by the DFG,the total area is limited (or minimized),the execution occurs within a number of stepsthat is limited (or minimized).

Two kinds of constraints play a crucial role duringscheduling and allocation. First and most familiarare the data dependency constraints (orconstraints). These ensure that no operation beginsexecution until all of its operand values have beencomputed. Each DD-constraint corresponds to anedge of the data flow graph (DFG) and bounds thestarting time of an operation by the completiontimes of its predecessors on the DFG. Completiontimes, in turn, depend on the particular units allo-cated. Thus, DD-constraints enforce the orderingimposed by the DFG, with consideration given tothe type of unit allocated to each operation.The second important set of constraints focuses

on each individual unit- its type and instance. Ifany unit is used by more than one operation, thenthose operations must be implemented in some se-quential order. No operation can begin until itspredecessor (if any) on the same unit has completedits execution. Although this constraint also boundsstarting times by the completion times of the prede-cessors, this ordering is unrelated to data depen-dency. The ordering here is imposed by the mappingof operations to units and is not predetermined. Itmust be ascertained dynamically, if (multi-step)modules are to be reused. We refer to the orderingamong operations that are executed by the sameunit as their unit-use ordering, enforced by unit-useconstraints (UU-constraints for short).

In the ILP solution to appear shortly, we intro-duce 0-1 variables Pij whenever it is feasible foroperation to precede operation j in their use ofthe same functional unit (whatever unit that mayturn out to be). When the ILP sets Pij-- 1, opera-tion is presumed to precede operation j. If we

restrict attention to those Pij that do become 1 inthe final solution, and further restrict attention tothose p’s referring to immediate predecessors, thesep’s describe disjoint paths that traverse all the oper-ation nodes of the DFG. Each such "UU" pathrefers to one allocated funcitonal unit and identifiesthe sequence of operations that employ that unit.

4. THE GENERAL ILP SOLUTION

The combined scheduling, allocation, and bindingproblem is formulated and solved as an integerlinear programming problem. The following symbolsare used to identify object types and to index impor-tant objects in the ILP solution:

object indexed byoperation i, j

unit rn

The unit index, rn identifies individual instancesof each module (not the module type) being consid-ered in the solution. An upper limit on the numberof instances for each type is determined by a prelim-inary appraisal of the DFG, module library, andglobal time and resource constraints.

Certain constants pertain to the module librarybeing considered:

Am is the chip area consumed by including unit rnin the design;

M is the set of modules that are capable of exe-cuting operation (an index set on m);

Dim where rn Mi, is the number of control stepsthat unit rn Would require to complete execu-tion of operation (not defined if rn q Mi).

In addition, the ILP solution makes use of thefollowing predefined values and sets of objects thatdepend on the particular DFG:

F

E[i, j]

Q[i,j]

is the final operation set, containing thoseoperations that have no successors in theDFG;is a 0-1 matrix rep.resenting the edge set ofthe DFG (DD-constraints); if 1 then opimmediately precedes opj in the DFG;is a 0-1 matrix indicating whether operationcould possibly precede operation j in their

Page 4: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

24 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

use of some common module (potentialUU-constraints); if 1, then M and M inter-sect;

N is some large constant, larger than the maxi-mum number of control steps and the num-ber of operations.

The ILP manipulates the following variables:

Yi: integer; denotes the starting control step for OPi

Xim 0, 1 Xim

Um O, 1 Um

Piy O, 1 Pij

1 oPi is executed by unit rn(undefined if rn Mi)

1 = unit rn is required in thedesign

1 opi precedes opy in theiruse of the same unit(undefined if Q[ i, j O)

Since both DD and UU-constraints delay the startof operations until their predecessors have finished,the completion step of an operation becomes animportant quantity within the solution. The first freestep, following execution of operation i, when usingmodule m, is given by the expression:

Yi -+- E XimOimrnM

Note that exactly one of the Xim will be non-zero,thereby causing the summation to equal the corre-sponding Dim.The two major parameters of interest are time

and area. The symbol T represents the total numberof steps required by the schedule, including thecompletion of all final operations in F. T may beconstrained in advance, may be a value to minimizeas part of the objective function, or both. Likewisethe total module area required by the design may beconstrained, may be minimized by the objectivefunction, or both. Its value is given by the expres-sion:

We can now consider the fundamental ILP formu-lation. This particular version seeks to minimize aweighted sum of total time, T, and total modulearea. Their relative weights are denoted Wtime andWarea respectively. Extensions to the ILP are pre-sented later.

OBJECTIVE

min Wtime , T + Warea * EumAmrn

CONSTRAINTS

(1) All operations must begin on some controlstep:

li" Yi > 0

(2) Each operation must be assigned to somefunctional unit:

ti: Xim= 1mM

(3) If an operation is mapped to a unit, that unitmust be included in the final design:

lm: EXim umN

(4) The overall completion time, T, is bounded bythe completion time of each operation in thefinal set, F:

fi F: Yi + E ximDim <- T + 1mM

(5) The starting time of each operation is boundedby the completion times of its predecessors onthe DFG (DD-precedence):

i, j where E[i, j] l’y + ximDim <__ yjmM

(6) If two operations are assigned to the sameunit, then one must precede the other in itsunit use (UU-precedence):

/i, j, m, where rn M t My:

Xim q-Xjm (Pij + Pji) <- 1

(7) If one operation precedes another in its use ofthe same unit, then the starting time of thesuccessor is bounded by the completion timeof the predecessor (UU-precedence):

/i, j where < j:

(a) IfQ[i,j]= land E[i,j]=0:

Yi + - ximDim <- Yj + (1 -pij)NmM

(b) If Q[j, i] 1 and E[j, i] 0:

Yj q- E XjmDjm Yi -[- (1 --pji)NmMj

Page 5: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 25

Constraint 6 forces either Piy or Pyi to be 1, incase operations and j are ever both assigned to thesame unit, m. Constraint 7 will guarantee that atmost one of these p’s can ever be 1. If eitherQ[i, j] 0 or Q[ j, i] 0, then at most one direc-tion of precedence is possible, at least one of the p’sis undefined, and both constraints become simpler.If Q[ i, j] O, Q[ j, i] 0, and M N My O, thenoperations and j must overlap in time, and con-straint 6 reduces to

Xim + Xjm 1

which enforces mutually exclusive use of unit m.Constraint 7 is meaningful only if one of the Piy

or Pyi does, in fact, become 1. In that case, therighthand side of the constraint will contain simplyy, the successor’s starting time which is being con-strainted. However, when p 0, the correspondingrighthand side includes a large constant, N, whichmakes the constraint satisfied automatically, therebydeactivating it.

5. SIMPLE ENHANCEMENTS.TO THEGENERAL FORMULATION

The integer linear program presented in Sec. 4 isthe basis for many possible variations and exten-sions. This section discusses some of the simplerfunctional enhancements, and subsequent sectionsdiscuss more substantial enchancements.

5.1 Other Objectives and Constraints

The total time and area available on a chip are oftenrestricted to lie within some prespecified limits.These are easily incorporated by adding the follow-ing constraints:

(8) T < StepLimit

(9) umAm <_ AreaLimitm

In practice, the general objective function whichminimizes a combination of time and area mayeither cause the ILP to execute too slowly or simply’be inappropriate for the situation. We usually havea specific number of control steps in mind (requiringconstraint 8) and seek to minimize the resourcesthat will meet this scheduling deadline. The objec-tive then becomes:

min EumAmm

By progressing through a series of plausible timelimits using this formulation, the sequence of solu-tions clearly displays the area/time tradeoffs to thedesigner.

Likewise, one can find the shortest schedule ob-tainable with a given set of resources. After specify-ing either total area (requiring constraint 9), or byspecifying which particular functional units are to beused, the objective becomes simply: min T. Thefastest option, and often a very useful application, isto verify the existence of some feasible solution thatmeets both time and total area constraints; thisrequires no objective function at all.Of course, additional constraints can be added to

limit the instances of certain unit classes or toexclude certain combinations of units from appear-ing in the design. The designer can always controlthe module types from which the ILP is allowed toselect. For example, to specify that some set ofmodules, X, is mutually exclusive, we include theconstraint: ErnXUrn 1. The constraint Xim--" 1requires that operation be done using module m;Xim 0 (or failing to define xira for this particularvalue of rn) does not allow module rn to be em-ployed for operation i. If a module, m, is known tobe in the design anyway, u should be set to 1, andits cost, Am, can be set to 0 to promote its reusewithin the ILP.An interesting possibility involves imposing con-

straints on the interval between the steps on whichtwo operations are scheduled. For example, if oper-ation must occur exactly k control steps beforeoperation j, then we set yy- Yi k. Many othersimple constraints can specify other requirements ofthe design.

5.2 Pipelined Functional Units

A simple but powerful extension allows the inclu-sion of pipelined functional units in the design.These differ from ordinary multi-step units in theirability to begin a new computation before comple-tion an earlier one. The minimum interval betweenaccepting successive inputs is called the latency ofthe unit, denoted L. For non-pipelined units, la-tency equals execution time, and Lira-" Dim. Ourmodel can considers both pipelined and non-pipe-lined implementation of any operation.For DD-constrained operations, pipelined units

make no difference. Operations cannot begin untiltheir operands have been computed. However, fordata independent operations that reuse the same(pipelined) unit, the successor need wait only for the

Page 6: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

26 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

latency period of the device. Constraint (7) thusbecomes:

Vi, j where < j

(7a)

(7b)

Yi +

IfQ[i,j] landE[i,j] =0:

E ximZim <- Yj + (1 -pij)UmM

If Q[j,i] 1 and E[j,i] 0:

E Xjmgjm < Yi + (1 -pji)NmMj

This simple modification to Constraint (7) is allthat is needed to include pipelined units in thedesign.

5.3 Multiple Block Designs

We use the preceding ILP formulation as the basisfor multi-block schedules, allocations, andbindings-not block-by-block or just along criticalpaths, but for all blocks of a design simultaneously.In fact, the identity of critical paths cannot alwaysbe known until module binding has determined thetime for each operation.The general idea is suggested in Fig. 1. Opera-

tions are confined to remain in their original control

blocks, and the blocks themselves always execute insome order, never concurrently. These assumptions,respectively, remove the need for either DD or UUconstraints between operation pairs appearing indifferent control blocks. In other words, schedulesand allocations in two separate blockscannot inter-fere with each other. Thus, the respective data flowgraphs for each block remain disconnected as theyare submitted to the ILP. While this may seem likeindependent scheduling of each block, the modulesare all being drawn from a common pool, and theirtotal area is counted in the objective function. Thus,each block is being scheduled concurrently with theothers, so as to maximize possibilities for reusingmodules among all blocks in the design. In addition,total execution time is being measured by the objec-tive function. The figure suggests how each controlpath provides a bound on this time. Therefore, inpaths that turn out to be critical, there may be atradeoff in the times allocated to their individualblocks. Note that step numbers within each blockbegin at 1, with an appropriate offset added afterthe ILP. Before this linearizing offset is added, theILP may appear to map several operations to thesame module on the same control step, but opera-tions from the same control block can never overlapin this way.

In practice we usually seek a feasible solution tomultiblock designs, in which the total area is con-strained, as is the longest execution time through

step

number

eachpath < T

FIGURE Multiple Block Design

Page 7: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 27

any control path. One such design includes both aconvolution algorithm and a bandpass filter. Thisparticular design involves 34 control blocks, 58 oper-ations in total, and from 1 to 8 operations per block.Finding a feasible solution requires only 4 secondsof execution time on a SPARCstation 2. The com-plexity of all the issues involved in multi-blockscheduling, allocation, and binding cannot be ade-quately covered here; these details are provided in aseparate manuscript under preparation.

5.4 Additional Enhancements

Some other extensions that have been incorporatedin the ILP model will be explained later in separatesections of their own. One of these is to allowchaining of funcitonal units within single controlsteps. Another is to account for register allocation,along with module allocation. This can be done intwo ways. The schedule can be constrained, so thatno more than a predefined number of registers willbe required or, alternatively, the registers and mod-ules can both be allocated by the same ILP, so thattotal area is either minimized or kept below a limit-ing value. These extensions will be described inSections 7 and 8, respectively.

functionality are essentially equivalent. Type bindingalone determines the number of units required fromeach module class and, hence, the area cost of thedesign. Type binding by itself is also sufficient toconstrain scheduling for single-step operations.On the other hand, properly scheduling multi-step

operations does require explicit instance binding, inorder to ensure continuity across control stepboundaries and non-overlapping use of multi-stepunits. Even operations on these modules that re-quire only one step have to be identified, in order toavoid conflict with other operations that requiremultiple steps on the same module. Thus, the needfor instance binding depends on the characteristicsof the module, as well as the operation.A functional unit type is characterized by its area,

operator capabilities, and the time required to per-form each operation. As a result, the functionalunits are categorized into two classes: 1) A single-stepfunctional unit, which takes one control step forexecuting all the operations involved in the DFGthat it is capable of performing, and 2) A multi-stepfunctional unit, which takes multiple steps for exe-cuting at least one of its operations that also appearsin the DFG. For the integrated ILP, the operationsare classifed into three categories, according to thetypes of functional units present in the modulelibrary. The categories are"

6. OPERATIONAL ENHANCEMENTS

Integer linear programs have the potential for longrunning times on large or inappropriate problems.Thus, it is important to incorporate features whichreduce their running times and to use them inappropriate ways. The following two subsections ex-plain specific techniques which greatly acceleratethe ILP running times for the problem consideredin this paper. The third subsection discusses theappropriate use of these ILPs in the context of alarge synthesis system.

6.1 Limited Instance Binding

A dramatic speedup in ILP running time resultsfrom exploiting the following observation: For oper-ations using modules which always execute in onecontrol step, instance binding does not affect theschedule or the number of units allocated. As longas some appropriate module is available, the mod-ule’s exact identity need not be decided until later.All available single-step modules with appropriate

(i)

(ii)

(iii)

Class A-ops: Operations that always need asingle control step and, in addition, can beexectued only by single-step functional units;

Class C-ops: Operations for which all appli-cable functional units are multi-step units(even if the operation in question may takeonly one step).Class B-ops: All other operations that do notfall in either of the above two classes.

The categorization of operations depends on thetype of functional units available for a design. This,in turn, affects the ILP formulation. Class C is themost general and corresponds to the general ILPsolution of Sec. 4. Class A operations will always beassociated with single-step modules which may re-main anonymous. By themselves, they could be han-dled by an ILP that is much simpler than the onepresented in Sec. 4. Operations in classes A and Cwill have only those variables and constraints de-fined that are appropriate to their respective classes.B-ops may wind up being assigned to either moduleclass and will have to participate in both kinds ofconstraints. If a class B operation is, in fact, mappedto a single-step unit type, the particular unit in-

Page 8: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

28 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

stance will not matter and will not be decided. If,however, the same class B operation is mapped to amulti-step unit, the specific unit must be deter-mined. It is useful to visualize the accelerated for-mulation as two tandem linear programs, with classA and C operations belonging to their respectiveILPs, and class B operations as straddling bothILPs.The following symbols are used to identify object

types and to index important objects in the ILPsolution:

object indexed byoperation i, j

unit m, nmodule type k, hcontrol step s, t

We now present the complete accelerated ILPthat performs instance binding only for operationsthat require it. For this purpose, a "special" moduletype is created and labeled as type k 0. Opera-tions in class B can be done by both single-step andmulti-step modules. Depending on the type of func-tional unit allocated, type binding or instance bind-ing is performed. This allocation is not known apriori and, therefore, provision should be made forboth the possibilities; the decision is taken dynami-cally. An operation is assigned to class 0 only whenit is not assigned to any regular single-step moduletype and, therefore, must be bound to some specificinstance of a multi-step module. Module type 0 isthus an "escape" type, implying participation in theother stream of constraints.

OBJECTIVE

The unit index, m, identifies individual unit in-stances of each multi-step module type, whereas kidentifies only the single-step module types beingconsidered in the solution. We introduce constantAk to denote the chip area consumed by includingan instance of a unit from type k in the design. Werestrict the earlier definition of Q[i,j] to indicatewhether operation could possibly precede opera-tion j in their use of some common multi-stepmodule.The ILP requires a few new variables and some

old ones to be appropraitely reinterpreted:

ng: integer; number of instances from single-stepmodule class k

Xim 0, 1 Xim 1 =

Um O, 1 U 1

Pij O, 1 Pij 1

Wik 0, 1 Wik 1 =:

opi is executed by multi-stepunit m

multi-step unit m is requiredin design

opi precedes opj in using thesame multi-step unit.

operation is executed onstep s by a unit from single-step type k

min

CONSTRAINTS

(1) Every operation in class A and operations inclass B that are done by single-step modules,must be assigned to some time step andmodule type.

Vi A B: EWiks-" 1k>O

(2) Every operation in classes B and C must beassigned to some control step.

li Ct2B: yi>O

(3) The two different step notations for B-opshave to be equated.

Vi B: SWik Yik>_O

(4) Every operation in class C must be assignedto an instance of a functional unit.

The total module area is now given by the expres-sion:

ViC: Xim-’lmM

EumAm -+- E nkAkm k>O

This expression may appear in the objective func-tion, in an area-limiting constraint, or both.

(5) Class B operations which are not assigned toany regular (k > 0) module type must bemapped to some module instance.

Vi eB, k=0: EWiks E XimmeM

Page 9: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 29

(6) If an operation is assigned to a unit, the unitmust be included in the design.

/m: Xim <_ umNiBuC

(7) Enough instances of each module type mustbe available to satisfy the demand on eachcontrol step.

/s, k > 0: Wik <_ nkiA uB

(8) The overall completion time T is bounded bythe completion times of the operations in thefinal set F.

(b) If Q[j,i] 1 and E[j,i] 0:

Yj / E XjmDjm < Yi + (1 -pji)NmMj

Constraint 3 equates the scheduling step nota-tions for B-ops appropriate to each method, i.e., itcauses Yi to equal the value of s corresponding towhich wik 1. Constraint 1 is responsible for allo-cating operations to module types, and if any B-op isallocated to type 0, constraint 5 ensures that it getsmapped to a module instance. Constraints 10 and 11are valid only for pairs of operations that can poten-tially share a multi-step functional unit.

6.2 Utilizing the Span of Scheduling Possibilities

(a) Ifi A tB: E -swiks < Tk>0

(b) IfiCUB:Yi/ XimDim<- T/ImM

(9) The starting time for any operation isbounded by the completion times of its pre-decessors on the DFG.

i, j where E[i, j] 1:

(a) IfiAUB:

(b) IfiCtdB:

E EtWjkt__< k>0

E s SWiks / 1k>O

Yi / E ximDimmM

ifj A

ifjBuC

(10) If two operations are assigned to the sameunit, one must precede the other in its unituse.

/i,j, rn where i,jCB and rn MiN

Xim / Xjm ( Pij / Pji) <- 1

(11) If one operation precedes another in its useof the same unit, then the starting time ofthe successor is bounded by the completiontime of the predecessor.

/i,j where < j and i,j C t B:

(a) If Q[i,j] 1 and E[i,j] 0:

Yi ./ E XimOim < Yj / (1 -piy)NmM

If the maximum number of control steps is specified,it is viable to determine the last (ALAP), as well asthe first (ASAP), control steps on which an opera-tion could possibly be scheduled. This informationcan be used to explicitly constrain the feasible start-ing step (yi) for each operation.

In addition, we also compute the span for everyoperation. The span of an operation i, introduced in[3], is defined to be the number of steps in which itcan possibly be using some functional unit. Mathe-matically, this is represented as:

SPAN (ALAP @ D/min ) ZSAP

where, D/man minm Mi{Dim}. Note that the fastestmodule is considered for calculating the span of opi.This is because the ALAP partitions have alreadybeen determined on the basis of fastest moduletypes available in the library and, therefore, (ALAP/ D/man 1) gives the step number by which execu-tion of must complete. In other words, if anoperation gets scheduled in its ALAP step, it can-not use any module other than the fastest oneavailable for that operation. Note that for multi-stepoperations the span can extend beyond their ALAPstep.We extend this concept to calculate the mutual or

joint spans for pairs of operations that can possiblyshare some functional unit. This information is uti-lized in removing redundant Pij and Pji variablesfrom the ILP. The span for a pair of operationsand j, in case precedes j, is represented as:

SPANij (ALAPy’+ Djmin) asaP

where, Ojmin is the delay of the fastest module thatcan perform opy. SPANiy is the maxirnun time boththese operations may be active, assuming beginsfirst. SPANij and SPAN. are often unequal. If there

Page 10: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

30 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

is any module that can complete both the operationswithin the joint span, then these operations canpossibly share that module. On the other hand, if atime constraint restricts their joint span, they cannotuse the same module. Potential opportunities forsharing resources between operation pairs deter-mine which Piy and Pyi variables are selectivelyintroduced in the ILP model. A Piy variable be-comes necessary for sequencing operations and j ifand only if:

=lm (M f) My), where,

SPANy > Dim + Ojm

i.e., there is some functional unit that can do opera-tions and j in this order, and can finish theirexecution within the span. Similarly, a Pji variable isnecessary if and only if:

:lm (M My), where,

SPANy > Dim + Djm

In a design, there are cases when both Pij and Pjivariables appear. Or, it may be the case that there isno need for defining the Piy and Pyi variables. Thislatter situation may arise under two circumstances.One occurs when the spans for a pair of operationsdo not overlap at all, and the second is when theirspans overlap but leave no room for their sequentialexecution.

6.3 Using the ILP in Context

Although our accelerated ILP solution typically re-quires only a few seconds, such an ILP runs the riskof slowing down on an ill-suited problem. In mostcases, the cause of poor performance is examinationof a large number of alternative optimum solutions,in order to determine that an earlier "provisional"solution is, in fact, optimum. Besides employinglocal ILP "tricks," there are two basic ways ofavoiding such problems. One of these is to incorpo-rate features into the ILP that will eliminate certainsymmetries and certain hopeless cases. Postpone-ment of instance bindings for single-step operations,for example, eliminates a large number of equiva-lent solutions, and the use of spans constricts thesearch space to an area where a realistic solutionmust be found.The more one can constrain the formulation (in

meaningful ways), the faster is the ILP likely to run.For instance, it is often possible to determine inadvance and communicate to the ILP the minimumfunctionality required to achieve a schedule with

acceptable time, or the bounds on control steps fora given module allocation. One good source of addi-tional constraints is a preliminary analysis of theDFG and module library. A synthesis system canpreselect modules and determine bounds on theirnumbers, prior to running the ILP.Another good source of constraining information

is from a previous solution attempt, using a simplerversion of the ILP or an altogether differentscheduling/allocation heuristic. This brings us tothe second major way of speeding up an ILP" not torun it as one monolithic process on a wide-openproblem, but rather to progress in steps, saving thefull ILP for the final assault on the optimum solu-tion. It is possible to use the ILP formulation itselfto find intermediate solutibns very quickly. Onesimply takes the previous best solution, reduces ei-ther the time or total area, and submits the newlyrefined constraints to the ILP solver without anyobjective function. If any feasible solution exists, onewill be found rapidly without requiring the ILPsolver to pursue even better (or possibly equivalent)solutions in that application. In the end, an ILP mayonly be asked to find the optimum when it is fine-tuning an already good solution.

7. CHAINING

Chaining refers to performing several data-depen-dent operations sequentially in a control step. Thenumber of operations that can be serialized in acontrol step depends on both the propagation delaysof the functional units used for executing thoseoperations and the duration of a step.

In our model, the potential operations that can bechained are selected by searching the module libraryfor sequences of single-step functional units capableof performing their respective operations, all withina single control step. During the optimization pro-cess, the ILP chains operations if there is any advan-tage in doing so. Constraint 7 (Sec. 6.1) still holds. Itensures separate instances of units for operationswhich are chained in a clock cycle. This constraintalso restricts the total number of single-step func-tional units available for any clock cycle and, hence,their area contribution. Therefore, it indirectly con-trols the use of chaining as well.

If we consider a pair q.f data-dependent opera-tions and j to be successfully chained, then wewant:

Wik "1- Wjh > 1, only if, WiksOik -’[- WjhsOjh 1.

(Note that the step number is s for both opi and

Page 11: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 31

Opj.) Dik and Dh denote the time taken by modulesfrom classes k and h to execute opi and opj, respec-tively. These inequalities state that two operationscan be scheduled in a step only if their combinedexecution time is less than or equal to the durationof one step. The precedence relation (Constraint 9a,Sec. 6.1) for any two operations belonging to eitherClass A or Class B should be interpreted as follows:

Vi, jwhereE[i,j] landi, jA t2B"

E EtWjht E sSWik > (0 ifOjh+Oik<_l

h>0 k>0 1 if Djh + Dik > 1

This means that if there are module classes whichcan do op and opj within a clock cycle, then theycan be chained and their step difference (t s), willbe zero. Otherwise, the constraint remains as be-fore, i.e., operations and j get scheduled in dif-ferent steps. Constraint 9a can be expressed in asingle inequality as:

(9a)E E tWjht- E ESWiksh>0 k>0

>-- E E OjhWjht + E EOikWiks 1h>0 k>0

Notice carefully the righthand side. If Dih + Dit< 1 and opi and opi use modules from classes kand h in the same step (when s), then therighthand side becomes < 0, and chaining can beactivated, if necessary. On the other hand, if Dh +Dik > 1, the righthand side becomes > 0, whichforces a difference of at least one step between theexecution of operations and j.Our formulation can be extended to cover three

or more operations of a data flow graph that canpotentially be chained together. This is done (in thecase of 3 operations) by including constraints ’for thetwo consecutive pairs of operations, as well as con-straints for all three together.

8. INCLUDING REGISTER ALLOCATION

Section 5.4 introduced the notion of counting therequired number of registers in the design. Thissection explains how it is actually done. Our modelcan dYnamically assess or control the number ofvariables that are live after each control step. Themaximum of these numbers equals the number ofdata storage registers required in the design. Note

that the formulation presented here does not actu-ally assign variables to specific registers (a processcalled register binding), but it does accurately con-sider their number.A portion of this problem is solved by Gebotys [8].

We begin in similar fashion but supply the addi-tional analysis to permit register allocation to befully integrated into our model and solved in oneapplication of the extended ILP.The entire preparatory process can be seen in

Fig. 2. Figure 2(a) shows the original DFG. Fiveseparate values, requiring five different registers, areshown from five earlier computations or parameters(1 through 5). Each interior operation (6 through 11)produces a single value which can be stored in asingle register, even if it is required at more thanone place later on. We use u(i) to represent eachdistinct value in terms of i, the operation or sourcethat produces it. Values that have more than oneultimate destination may require special attention;we designate such values as multi-use values and callthe others single-use values. Four output values arealso shown. Since their destinations need not bedistinguished for our purposes, they are shown lead-ing to a generic "end node", numbered 12.

Multi-use values complicate the problem of regis-ter allocation. The value in the register must beretained until its final use. If the final use is notimplied by the DFG, it must be determined by thescheduler. Dynamic determination of the final use isan important contribution of our solution.The following rules simplify the DFG for the

purposes of register allocation prior to adding con-straints to the ILP. Recall that E[i, j] 1 iff theDFG contains an edge from node to node j’.Following the usual convention, E*[i, j] 1 iff thereis a path from node to node j. Let I represent theindex set of prior operations or parameters thatsupply input operands, and let e identify the single"end node". (The original DD constraints remainfor scheduling purposes, of course):

1. Any input operand, which must be retainedunchanged throughout the design, accounts forone register; all of its outgoing edges may beremoved, since their destinations can all obtainthe value from the dedicated register. This canbe stated formally as: if E[i, e] 1 and I,then delete every edge corresponding toE[i, j] 1.

2. If a multi-use value has one destination that isnecessarily a predecessor of one of its otherdestinations, the edge to the predecessor nodecan be removed from the DFG. Thus, if

Page 12: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

32 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

2 3 4 5 2 3 4 5

(a) Original DFG (b) DFG partially reduced

3

(c) Fully reduced DFG

FIGURE 2

(d) Values whose final uses aredecided by the ILP

DFG Reduction for Regi.ster Allocation

E[i, J’l] 1, E[i, J2] 1, and E*[j1, J2] 1,then delete edge E[i, Jl]. Note that if rule 2removes all but one edge originating at node i,then v(i) becomes a single-use value.

Any directed path containing only single-usevalues can be replaced by a single edge con-necting the first node and the last node of thepath. That is, if E*[i,j] 1, and for everynode i, 4: j" in the path, v(i,) is a single-usevalue, then every edge in the path can beremoved, provided a new edge E[i, j] is addedto replace them.

These rules, except the last, essentially follow theanalysis of Gebotys [8]. Their purpose is to eliminateedges of the DFG that definitely do not imply anadditional live value; all eliminated edges representvalues that must be retained for later use anyway.

The final rule acknowledges that a register is re-quired for each value entering and each value leav-ing an operation. Even if different registers are usedfor th,e input and output values, their numbers willbe the same-which is all that matters here.

Figure 2(b) shows the result of applying the firsttwo rules. The values v(2) and v(5) are global andconsume registers throught. The value produced byoperation 7 will clearly have its final use by opera-tion 10, allowing edge (7, 8) to be removed and v(7)to become a single-use value.

Figure 2(c) shows the result of applying the finalrule. Two paths containing only single-use values,{(1, 9), (9, 11), (11, 12)} and {(4, 7), (7, 10), (10, 12)},extend throughout the DFG. After replacement bysingle edges, these edges can be entirely eliminatedby rule 1. Of course, two more registers will beconsumed. The interior path, {(8, 10), (10, 11)}, can

Page 13: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 33

also be replaced by a single edge, which remains inthe DFG to depict the equivalent register commit-ment.

Figure 2(d)shows the three register-related valuesthat must be considered by the ILP. Note that thefinal schedule could require either one or two regis-ters, in addition to the four already committed.(Consider the sequential schedules: {7, 6, 9, 8, 10, 11}and {6,7,8,9,10,11}, which require one and twoextra registers, respectively.)At this point, additional variables and constraints

must be added to the ILP. Define variables:

r (integer); the number of registers in the design

gis 0, 1; gis 1 iff Yi S, i. e., v(i)originates on step s

fis 0, 1; fis 1 iff v(i) has its final useon step s

The following constraints are added to the versionof the ILP given in Sec. 4. Their effects are de-scribed below. Minor changes will adapt these con-straints for the accelerated version in Sec. 6.1.

Constraint 15 bounds the final use time by thecompletion times of each of its separate uses.Therefore, the unqiue value of s for which fi 1will be at least as large as the time of its final use,the one that corresponds to the release of its regis-ter.At each control step, the number of previously

consumed values is subtracted from the number ofpreviously generated values. This gives the numberof live values at that step. These numbers are lowerbounds on the number of registers in the design.Constraint 16 provides these bounds for r.The number of registers can be limited by a

constraint at the outset. Alternatively (or addition-ally), r can be included in the objective function toconsider register area and functional unit area si-multaneously. The objective function then becomes:

min

where Areg denotes the area of a single register.

remaining v(i):

(12) -gis- 1

(13) gis * s Yi

(14) Ef/, 1

remaining v(i), and j such that E[i, j] 1:

(15) yj -]- XjmOjm 1 < Efis * srn

steps t"

(16) E E ( gis fis) <- rs<t

9. RESULTS

We have used our system (SYMPHONY), to synthe-size some of the standard benchmark examplesavailable in the literature. In this section, we presentthe results obtained. As expected, our integratedapproach often achieves better results than itsheuristic counterparts. It also demonstrates the ad-vantage of integrating the scheduling, allocation,and module binding tasks.The first example is the standard high-level syn-

thesis benchmark: elliptical wave filter [10]. Table Ishows the results for 17, 18, and 19 step schedulesby using the same module library as used by HAL[10]. It shows the CPU times consumed by ouroriginal model (Sec. 4) and the modified model (aspresented in Sec. 6) for determining the optimalsolutions. Although the (well-known) optimal solu-

Constraints 12 and 13 ensure that gis becomes 1only on the unique step, s, where the value isgenerated. Constraint 14 ensures that only one finaluse for this value will be counted (for one value ofs). Input source values have &0 1 to suggest avalue generated before step 1, and require neitherconstraints (12) or (13). Output values have fie 1to suggest a final use on step e, the end, past the laststep considered by the ILP; constraints (14) and (15)are not required in this case.

TABLEPerformance of Our Original and Modified Models

C-steps Model Adders Muls CPUtime*

17 Original 3 3 159.62 mins.Modified "3 3 2.2 sec.

18 Original 2 2 183.7 mins.Modified 2 2 4.69 mins.

19 Original 2 2 731.4 mins.Modified 2 2 5.97 mins.

* On a Sun SPARCstation 2.

Page 14: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

34 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

TABLE IIModule Library

Component Types Area

TABLE VBandpass Filter

Delay ’CPU(C-steps) Csteps Model Modules Area time

Adder + 50 1+ 30 2

Multiplier *2 400 2*3 250 3

TABLE IIIElliptical Wave Filter

Wtlme Warea Modules C-steps Area

0.9 0.1 + 1, + 1, + 1, *2, *2, *2 17 13500.8 0.2 + 1, + 1’ *2, *2 18 9000.6 0.2 + 1, + 1, + *2, *2 18 9000.7 0.3 + 1, + 1, *2, *3 19 7500.6 0.4 + 1, + 1, *2 21 5000.4 0.3 + 1, + 1, "2 21 500

tions are obtained for either case, there is a tremen-dous improvement in the running times for themodified model.We also synthesize the elliptical wave filter exam-

ple by assigning different weights to the area andtime costs of the piecewise linear objective function.The module library used is presented in Table II.The results depict the sensitivity of our model tovarying weights. By changing the weights randomly,SYMPHONY obtains optimum results correspond-.ing to 17 through 21 step schedules (Table III).These results also show the ability of our model touse different physical implementations for the sameoperation type. For the 19-step schedule, it utilizes a2-step and a 3-step mutliplier, rather than using two2-step multipliers.The second example is the differential equation

benchmark that originally appeared in [10]. Thisexample is synthesized under various assumptions:without chaining and pipeline units in the library;only using pipelined units (pipelined multipliers hav-ing a latency of 1); premitting only chaining; andpermitting both chaining and the usage of pipelinedunits. The results are shown in Table IV. AlthoughHAL [10] and ALPS [11] produce the same results

8 SYMPHONY +*, + -, + -, t175 4.78secsADPS + -, *, *, +, 685 9secs

9 SYMPHONY + -, *, *, + 625 16.87secsADPS + -, + -, *, 650 64secs

10 SYMPHONY + -, *, *, + 625 32.34secsADPS +, + -, *, + 650 132secs

11 SYMPHONY +*, + -, II00 41.9secsADPS + -*, + -, 630 197secs

for the first three cases depicted in Table IV, neither

of them has shown results when both chaining andusage ofpipelined units are allowed.

Next, we use the bandpass filter example from [5].Table V compares our results with ADPS [5]. ADPSuses force-directed scheduling [10] together with anILP to allocate functional units to the pre-determined schedule and to bind the operations toindividual instances of those allocated units. Forevery case, SYMPHONY yields better results thanADPS. The times shown are for determining thefeasible solutions. Instead of finding the optimalsolutions, the time and area are suitably constrainedto desired values, and the ILP is asked to determinefeasible solutions within those specified limits. Thisapproach eliminates many equivalent results andspeeds up the solution process.

Finally, we use the fifth order elliptical wave filterbenchmark once again to compare the performanceof SYMPHONY with other systems. Table VI com-pares our results with both HAL [10] and CHASSIS[3]. The module library used is the same as that usedby CHASSIS, in order to make a fair comparison. Inthe table, +i represents an adder that takes con-trol steps for executing an addition; *i is to beinterpreted similarly for multiplication. AlthoughHAL assumes a single physical implementation forany given functional unit, we include their figuresjust to illustrate the improvement possible by usingdifferent physical implementations of a functionalunit (17, 19, and 20 step cases). CHASSIS is anintegrated heuristic approach that can incorporate

C-steps

TABLE IVDifferential Equation Example

Neither Pipelined Units Chaining

ALUs Muls ALUs Muls ALUs Muls

Both

ALUs Muls

not applicable not applicable 3 32 3 2 2 32 2 2 2 2

2 2

Page 15: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

DATAPATH SYNTHESIS 35

Csteps

TABLE VIFifth Order Elliptical Wave Filter Comparison

Model Modules used

17 SYMPHONY + 1, + 1, + 1,’1 21500CHASSIS n/a n/aHAL + 1, + 1, / 1, *2, *2, *2 28500

18 SYMPHONY + 1, / 1, *2, "2 19000CHASSIS + 1, + 1, / 3, + 3, "1 20600HAL / 1, / 1, "2, "2 19000

19 SYMPHONY + 1, / 1, *2, "3 14000CHASSIS + 1, + .t, *2, *3, *3 17000HAL + 1, + 1, *2, *2 19000

20 SYMPHONY + 1, + 1, *2, *3 14000CHASSIS + 1, + 1, + 2, *2, *3 14500HAL + 1, + 1, *2, *2 19000

21 SYMPHONY + 1, + 1, *2 11000CHASSIS + 1, + 1, + 2, + 2, *3, *3, *3, *3 16000HAL + 1, + 1, *2 11000

modules of fairly general type in a design. In all thecases, SYMPHONY produces results that are betterthan CHASSIS-specially in the 19 and 21 stepschedules.SYMPHONY is implemented in the UNIX/C

environment on a Sun SPARCstation 2. The ILPmodel is solved using the commercial linear pro-gramming package LINDO [12].

I0. CONCLUSION

In this paper, we have presented an integrated andglobal solution to the problems of scheduling, mod-ule and register allocation, and module binding.This method handles all the interactions amongthese subproblems. The solution permits the use ofcomplex functional units and allows operations to beimplemented by a variety of functional units, possi-bly requiring different execution times. We have alsoshown how the model can be extended to effectively

use pipelined units or chain operations wheneverthere is an opportunity for doing so. In a straightfor-ward way, our model can also find optimal schedulesand module allocations for multiple-block designs,not block-by-block or just along critical paths, butfor all the blocks of a design simultaneously; thedetails of multi-block synthesis will be available in a

separate manuscript. It has also been extended to

totally incorporate register allocation at the sametime. We believe SYMPHONY is the most compre-hensive and integrated model developed so far forthe tasks of scheduling, module and register alloca-tion, and operation binding. Moreover, the capabil-ity of using a complex and generalized module li-brary is unique to our system.

References

[1] C-T. Hwang, J-H. Lee, and Y-C. Hsu. "A Formal Ap-proach to the Scheduling Problem in High Level Synthesis".IEEE Transactions on Computer-Aided Design, 10(4):464-475, April 1991.

[2] R.J. Cloutier and D.E. Thomas. "The Combination ofScheduling, Allocation, and Mapping in a SingleAlgorithm". In Proc. of the 27th Design Automation Confer-ence, pages 71-76. ACM/IEEE, 1990.

[3] L. Ramachandran and D.D. Gajski. "An Algorithm forComponent Selection in Performance Optimizied Schedul-ing". In Proc. of International Conference on Computer-Aided Design, pages 92-95. IEEE, 1991.

[4] M. Balakrishnan and P. Marwedel. "Integrated Schedulingand Binding: A Synthesis Approach for Design Space Ex-ploration". In Proc. of the 26th Design Automation Confer-ence, pages 68-74. ACM/ IEEE, 1989.

[5] C.A. Papachristou and H. Konuk. "A Linear ProgramDriven Scheduling and Allocation Method Followed by anInterconnect Optimization Algorithm". In Proc. of the 27thDesign Automation Conference, pages 77-83. ACM/ IEEE,1990.

[6] S. Devadas and A. R. Newton. "Algorithms for HardwareAllocation in Data Path Synthesis". IEEE Transactions onComputer-Aided Design, 8(7): 768-781, July 1989.

[7] M. Nourani and C.A. Papachristou. "Move Frame Schedul-ing and Mixed Scheduling-Allocation for the AutomatedSynthesis of Digital Systems". In Proc. of the 29th DesignAutomation Conference, pages 99-105. ACM/IEEE, 1992.

[8] C.H. Gebotys and M.I. Elmasry. "Simultaneous Schedulingand Allocation for Cost Constrained Optimal ArchitecturalSynthesis". In Proc. of the 28th Design Automation Confer-ence, pages 2-7. ACM/IEEE, 1991.

[9] C. H. Gebotys. "Optimal Scheduling and Allocation ofEmbedded VLSI Chips". In Proc. of the 29th Design Au-tomation Conference, pages 116-119. ACM/ IEEE, 1992.

[10] Pierre G. Paulin. High-level Synthesis of Digital CircuitsUsing Global Scheduling and Binding Algorithms. Ph.D the-sis, Carleton University, Ottawa, January 1988.

[11] J-H. Lee, Y-C. Hsu, and Y-L. Lin. "A New Integer LinearProgramming Formulation for the Scheduling Problem inData Path Synthesis". In Proc. of International Conferenceon Computer Aided Design, pages 20-23. IEEE, 1989.

[12] Linus Schrage. LINDO: User’s Manual. The Scientific Press,San Franciso, CA, 1991.

[13] J. Vanhoof et al. High-Level Synthesis for Real-Time DigitalSignal Processing. Kluwer Academic Publ’!shers, Boston, Ma.1993.

Biographies

THOMAS C. WILSON is an Associate Professor in the Depart-ment of Computing & Information Science at the University ofGuelph, Guelph, Ontario. He holds a BA degree from theUniversity of Iowa, M.Sc from the University of Chicago, andPh.D. from the University of Waterloo. His research interests lie

in the areas of distributed systems and VLSI-CAD, particularlyhigh-level synthesis and retargetable code generation.

NILANJAN MUKHERJEE received his B.Tech degee with hon-ours in Electronics and Electrical Communication Engineeringfrom Indian Institute of Technology, Kharagpur, India, in 1989,and the M.S. degree in Computer Science from University ofGuelph, Canada, in 1992. Presently, he is pursuing a Ph.D. inElectrical Engineering at McGill University, Montreal, Canada.His research interests include high-level synthesis, hardware-software codesign, testing, and synthesis of testable designs.

Page 16: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

36 T.C. WILSON, N. MUKHERJEE, M.K. GARG, and D.K. BANERJI

MAI-IESI-I K. GARG holds a B.Sc degree in Chemical Engineer-ing from the University of Roorkee, India, M.A.Sc in ChemicalEngineering from the University of Waterloo, and M.Sc in Com-puter Science from the University of Guelph, Guelph, Ontario.Presently, he is working as a System Administrator in the CASEgroup at the IBM Laboratory in Don Mills, Ontario.

DILIP K. BANERJI is a Professor in the Department of Comput-ing & Information Science at the University of Guelph, Guelph,Ontario. He holds a B.Tech degree in Electrical Engineeringfrom the Indian Institute of Technology, Kanpur, M.Sc. in Elec-trical Engineering from the University of Ottawa, Canada, andPh.D. in Computer Science from the University of Waterloo. Hisprimary research interest is VLSI design automation, particularlyhigh-level synthesis.

Page 17: ILP Optimum Scheduling, Module Register Allocation ...downloads.hindawi.com/archive/1995/023249.pdf · Wepresent an integrated and optimal solution to the problems of operator scheduling,

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Active and Passive Electronic Components

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Navigation and Observation

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

DistributedSensor Networks

International Journal of