design & co-design of embedded systems distributed system co-synthesis (1) maziar goudarzi

Design & Co-design of Embedded Systems

Distributed System Co-synthesis (1)

Maziar Goudarzi

Fall 2005 Design & Co-design of Embedded Systems

2

Today Program

IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis (part 1)

References:

Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.

S. Prakash, A. Parker, “Synthesis of Application-Specific Multiprocessor Architectures,” ACM/IEEE Design Automation Conference, 1991.


3

Topics

IntroductionA Integer Linear Programming

ModelA Heuristic Algorithm (next session)

On ordinary task graphs On an Object-Oriented model


4

Introduction to Distributed System Co-Syn.

Does not use an architectural templateInstead, creates a multiprocessor

architecture during co-synthesis Usually heterogeneous multiprocessor in

Processing ElementsCommunication ChannelsTopologies

Less emphasis on the design of ASICsMore emphasis on the design of

multiprocessor topology


5

Introduction to Distrib. Sys. CoSyn. (cont’d)

Very common in practice A decade ago:

Specially large CPU + small microcontrollers + small ASICs

Specifically today:MPSoC: Multiprocessor System-on-Chip

Co-Synthesis Algorithms:Distributed System Co-Synthesis

Integer Linear Programming Model


7

ILP Model

Introduction Linear Programming (LP):

Minimizing/maximizing a Linear target function• Subject to a set of Linear constraints

Current algorithms: Do find the optimal solution, or else the problem is not feasible at all.

Example: Knapsack problem Integer Linear Programming (ILP)

Integer-solution counterpart of LPExample: Knapsack problem with integer-

solution constraint


8

ILP Model (cont’d)

Introduction (cont’d) Mixed Integer Linear Programming (MILP)

One (or more) non-integer variables included

0-1 Integer Linear Programming (0-1 ILP)Only binary variables (can only be 0 or 1)

Current algorithms: Absolute optimal solution is found

• Takes much CPU time• Only feasible for fairly small problems


9

Prakash-Parker ILP Model

By Prakash and Parker, 1991 Developed an ILP formulation

Used general ILP solvers to solve it

Inputs to the algorithmSingle-rate task graphTechnology model for the PEs, communication channels,

and processes’ execution characteristics on them

Target functionMinimize system implementation cost

ConstraintsDescribe the requirements of the system


10

Prakash-Parker ILP Model (cont’d)

Algorithm classification criteria Input Model

Single-rate task graph

Target ArchitectureDistributed multiprocessor

QuantumProcesses of the task graph

Cost EstimationBased on technology models provided to the

algorithmRepresented as target function of the ILP


11

Prakash-Parker ILP Model (cont’d)

Algorithm classification criteria (cont’d) Performance Estimation

Based on technology models provided to the algorithm

Scheduling, AllocationEmbedded in the ILP formulation constraints

Algorithm detailsTarget Function

• Minimize cost (or maximize performance)Sets of Constraints

• Allocation (PE and communication links)• Scheduling (Processes on PEs, and communications on links)


12

Prakash-Parker MILP Model: Input Task Graph

Nodes Sa:Process a

Edges Data communication ia,b:input b to process a oa,c: output c from

process a Differs from DFG

Sa may start before all its inputs are ready

Sa may produce outputs before finishing

Introduces: fR(ia,b) , fA(oa,c)


13

Prakash-Parker MILP Model:Output Multiprocessor Arch.

Nodes Pi: Processor i

Edges li,j: Link between

processor i and j Point-to-point

comm., but extendable to bus, multi-hop and other styles


14

Prakash-Parker MILP Model:Given Inputs

Pa: set of all processors capable of running Sa

Dps(Pt, Sa): Exec. time of Sa if run on a processor of type Pt

Va1, a2: volume of data transferred from Sa1 to Sa2

Remote transfer: if Sa1 and Sa2 are mapped to different processors

Local transfer: if both mapped to the same processorComm. cost depends on the transfer being

local/remote DCL: cost of local transfer of unit volume of data DCR: cost of remote transfer of unit volume of data

Waiting time for the channel is not included in DCR


15

Prakash-Parker MILP Model:Given Inputs (cont’d)

Set P: set of all available processors P=a Pa

Cd: Cost of processor pdP

CL: Cost of a comm. link between two processors


16

Prakash-Parker MILP Model:Variables

Timing variables + binary variablesTiming variables

real-valued Data availability timing variables

TIA(ia,b): input availability timeTOA(oa,c): output availability time

Subtask execution timing variablesTSS(Sa): start-time of sub-task Sa

TSE(Sa): end-time of sub-task Sa


17

Prakash-Parker MILP Model:Variables (cont’d)

Timing variables (cont’d) Data transfer timing variables

TCS(ia,b): start-time of comm. ia,b

TCE(ia,b): end-time of comm. ia,b

Binary variables Subtask-to-processor mapping variables

d,a=1 : Sa is mapped to Pd

Data-transfer-type variablesa1,a2=1 : comm. between Sa1, Sa2 is a remote

transfer


18

Prakash-Parker MILP Model:Constraints

Allocation Processor-selection

constraintsEach process must be

assigned to one and only one (not more, not less) processor

Data-transfer type constraints

Each communication must be either local or multi-hop. But not both, and not neither

ad Pp

ad 1,

21

2,1,2,1 1adad PpPp

adadaa


19

Prakash-Parker MILP Model:Constraints (cont’d)

Scheduling Input-availability

constraintsData cannot be used by the

sink process until after produced by the source process

Output-availability constraints

Data must obey the fractional output generation parameters

)()( ,, baCEbaIA iTiT

)()()()()( ,, aSSaSEcaAaSScaOA STSTofSToT


20


Scheduling (cont’d) Subtask-execution-start constraints

Relation between availability of inputs and start-time of the process (subtask) must be satisfied

Subtask-execution-end constraints Process finish-time depends on its start-time

and the PE on which it executes

)()()()()( ,, aSSaSEbaAaSSbaIA STSTifSTiT

ad Pp

adPSadaSSaSE SpTypeDSTST )),(()()( ,


21


Scheduling (cont’d) Data-transfer-start constraints

Data-transfer cannot be started unless the corresponding output is already produced

Data-transfer-end constraintsLocal/remote transfer latencies must be

considered

)()( 1,12,2 caOAbaCS OTiT

2,12,12,12,12,22,2 )1()()( aaCLaaaaCRaabaCSbaCE VDVDiTiT


22


Scheduling- Proper sharing of resources Define an overlap function Processor-usage-exclusion

Processes on a single PE must not execute simultaneously

Communication-usage-exclusionMultiple comm. must not be scheduled

on the same link simultaneously


23

Prakash-Parker MILP Model:Objective (Target) Function

Alternative 1: Maximize performanceMaximize TF

To ensure TF is the end-time of entire task


24

Prakash-Parker MILP Model:Objective Function (cont’d)

Alternative 2: Minimize cost

Two new binary variables:• Processor-selection variable (d)

• Comm.-link-selection variable (d1,d2)


25

Prakash-Parker MILP Model (cont’d)

More constraints can be added at wish

Non-linear constraints were linearized

The MILP formulation solved using Bozo (Branch-and-Bound MILP solver using XMP linear-programming package)


26

Prakash-Parker MILP Model: Experimental Results

Example 1: 4 nodes in task graph Other assumptions

Va1,a2=1

DCL=0

DCR=1

CL=1


27

Prakash-Parker MILP Model: Experimental Results (cont’d)

Example 1 (cont’d) 93 variables (21 timing, 72 binary) 174 constraints

CPU: Solbourne Series5e/900 (Similar to Sun SPARCsystem 4/490) + 128 MB memory

Cost as constraint. Performance is optimized.


28


Example 2: 9 nodes 272 variables

47 timing, 225 binary

1081 constraints


29


Example 2 (cont’d)


30


Experimental Results Applied only to relatively small problems

Reason: use of general ILP solvers Their largest task graph: 9 processes

• Took 6000 CPU minutes on an unspecified processor

Significance of the workDid Achieve precisely optimal solutions on those

examples which they could solveUsed as benchmarks for heuristic co-synthesis

algorithms


31

What we learned today

Distributed System Co-Synthesis: The other broad category of co-synthesis

algorithms General terms One famous instance:

Integer Linear Programming [Prakash-Parker 91]

design & co-design of embedded systems distributed system co-synthesis (1) maziar goudarzi

Documents