1 simulated evolution algorithm for multiobjective vlsi netlist bi-partitioning by dr sadiq m. sait...

Post on 20-Dec-2015

226 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Simulated Evolution Algorithm for Multiobjective VLSI

Netlist Bi-Partitioning

By

Dr Sadiq M. Sait

Dr Aiman El-Maleh

Raslan Al Abaji

King Fahd University

Computer Engineering Department

MS Thesis Presentation

2

• Introduction

• Problem Formulation

• Cost Functions

• Proposed Approaches

• Experimental results

• Conclusion

Outline ….

3

Design Characteristics

0.13M12MHz1.5um

CAESystems,Silicon

compilation

7.5M333MHz0.25um

Cycle-basedsimulation,

FormalVerification

3.3M200MHz

0.6um

Top-DownDesign,

Emulation

1.2M50MHz0.8um

HDLs,Synthesis

0.06M2MHz6um

SPICESimulation

Key CAD Capabilities

The Challenges to sustain such an exponential growth to achieve gigascale integration have shifted in a large degree, from the process of manufacturing technologies to the design technology.

VLSI Technology Trend

4

Technology 0.1 umTransistors 200 MLogic gates 40 MSize 520 mm2

Clock 2 - 3.5 GHzChip I/O’s 4,000Wiring levels 7 - 8Voltage 0.9 - 1.2Power 160 WattsSupply current ~160 Amps

PerformancePower consumptionNoise immunityAreaCostTime-to-market

Tradeoffs!!!

The VLSI Chip in 2006

5

1. System Specification

2. Functional Design

3. Logic Design

4. Circuit Design

5. Physical Design

6. Design Verification

7. Fabrication

8. Packaging Testing and Debugging

•VLSI design process is carried out at a number of levels.

VLSI Design Cycle

6

Physical Design converts a circuit description into a geometric description. This description is used to manufacture a chip.

1. Partitioning

2. Floorplanning and Placement

3. Routing

4. Compaction

The physical design cycle consists of:

Physical Design

7

• Decomposition of a complex system into smaller subsystems.

• Each subsystem can be designed independently speeding up the design process (divide-and conquer-approach).

• Decompose a complex IC into a number of functional blocks, each of them designed by one or a team of engineers.

• Decomposition scheme has to minimize the interconnections between subsystems.

Why we need Partitioning ?

8

System Level Partitioning

Board Level Partitioning

Chip Level Partitioning

System

PCBs

Chips

Subcircuits/ Blocks

Levels of Partitioning

9

Partitioning Algorithms

Group Migration Simulation Based IterativePerformance

Driven

1. Kernighan-Lin

2. Fiduccia-Mattheyeses (FM)

3. Multilevel K-way Partitioning

Others

1. Simulated annealing

2. Simulated evolution

3. Tabu Search

4. Genetic

1. Lawler et al.

2. Vaishnav

3. choi et al.

4. jun’ichiro et al.

1. Spectral

2. Multilevel Spectral

Classification of Partitioning Algorithms

10

Related previous Works1999 Two low power oriented techniques based on simulated annealing (SA)

algorithm by choi et al.

1969 A bottom-up approach for delay optimization (clustering) was proposed by Lawler et al.

1998 A circuit partitioning algorithm under path delay constraint is proposed by jun’ichiro et al. The proposed algorithm consists of the clustering and iterative improvement phases.

1999 Enumerative partitioning algorithm targeting low power is proposed in Vaishnav et al. Enumerates alternate partitionings and selects a partitioning that has the same delay but less power dissipation. (not feasible for huge circuits.)

11

Need for Power optimization

• Portable devices.

• Power consumption is a hindrance in further integration.

• Increasing clock frequency.

Need for delay optimization

• In current sub micron design wire delay tend to dominate gate delay. Larger die size imply long on-chip global routes, which affect performance.

• Optimizing delay due to off-chip capacitance.

Motivation

12

Objective

• Design a class of iterative algorithms for VLSI multi objective partitioning.

• Explore partitioning from a wider angle and consider circuit delay , power dissipation and interconnect in the same time, under balance constraint.

13

Objectives :

• Power cost is optimized AND

• Delay cost is optimized AND

• Cutset cost is optimized

Constraint

• Balanced partitions to a certain tolerance degree. (10%)

Problem formulation

14

Problem formulation• the circuit is modeled as a hypergraph H(V,E)

• Where V ={v1,v2,v3,… vn} is a set of modules (cells).

• And E = {e1, e2, e3,… ek} is a set of hyperedges. Being the set of signal nets, each net is a subset of V containing the modules that the net connects.

• A two-way partitioning of a set of nodes V is to determine two subsets VA and VB such that VA U VB = V and VA VB =

15

• Based on hypergraph model H = (V, E)

• Cost 1: c(e) = 1 if e spans more than 1 block

• Cutset = sum of hyperedge costs

• Efficient gain computation and update

cutset = 3

Cutset

16

SE1 SE2C1 C4 C5

C3

C2

C6

Cu

t Lin

e

CoffChip

C7

Metal 1

Metal 2

path : SE1 C1C4C5SE2.

Delay = CDSE1 + CDC1+ CDC4+ CDC5+ CDSE2

CDC1 = BDC1 + LFC1 * ( Coffchip + CINPC2+ CINPC3+ CINPC4)

Delay Model

17

PinetPicell

netDelaycellDelay )()(Delay(Pi) =

Picell

cellDelay )(Delay(Pi) =

Pi: is any path Between 2 cells or nodes

P : set of all paths of the circuit.

)(: PiDelayMaxObjectivePPi

Delay

18

The average dynamic power consumed by CMOS logic gate in a synchronous circuit is given by:

iLoadi

cycle

ddaveragei NC

T

VP

2

5.0

Ni : is the number of output gate transition per cycle( switching Probability)

LoadiC : Is the Load Capacitance

Power

19

extrai

basici

Loadi CCC basiciC : Load Capacitances driven by a cell

before Partitioning

extraiC : additional Load due to off chip

capacitance.( cut net)

ii

extrai

basici

cycle

dd NCCT

VP

2Total Power dissipation of a Circuit:

Power

20

vi

iNobjective

:

basici

extrai CC

extraiC : Can be assumed identical for all nets

v :Set of Visible gates Driving a load outside the partition.

Power

21

The Balance as constraint is expressed as follows:

TolblockscellsBlockTolblockscells i //

PercentTolblockcellsTol */However balance as a constraint is not appealing because it may prohibits lots of good moves.

Objective : |Cells(block1) – Cells( block2)|

Balance

22

• Weighted Sum Approach

MaxPower

PowerW

MaxCutest

CutsetCostW

Maxdelay

fCircuitDelayCostoWt pcd *cos

1. Problems in choosing Weights.

2. Need to tune for every circuit.

Unifying Objectives, How ?

23

• Imprecise values of the objectives– best represented by linguistic terms that are

basis of fuzzy algebra• Conflicting objectives• Operators for aggregating function

Fuzzy logic for cost function

24

1. The cost to membership mapping.

2. Linguistic fuzzy rule for combining the membership values in an aggregating function.

3. Translation of the linguistic rule in form of appropriate fuzzy operators.

Use of fuzzy logic for Multi-objective cost function

25

• And-like operators– Min operator = min (1, 2)– And-like OWA

= * min (1, 2) + ½ (1- ) (1+ 2)

Or-like operators– Max operator = max (1, 2)– Or-like OWA

= * max (1, 2) + ½ (1- ) (1+ 2)Where is a constant in range [0,1]

Some fuzzy operators

26

Where Oi and Ci are lower bound and actual cost of objective “i”

i(x) is the membership of solution x in set “good ‘i’ ”

gi is the relative acceptance limit for each objective.

Membership functions

27

• A good partitioning can be described by the following fuzzy rule

IF solution has

small cutset AND

low power AND

short delay AND

good Balance.

THEN it is a good solution

Fuzzy linguistic rule

28

The above rule is translated to AND-like OWA

BDPC

BDPCx

4

11

,,,min)(

Represent the total Fuzzy fitness of the solution, our aim is to Maximize this fitness.

)(x

BDPC ,,, Respectively (Cutset, Power, Delay , Balance ) Fitness.

Fuzzy cost function

29

Simulated EvolutionAlgorithm Simulated evolution

Begin

 Start with an initial feasible Partition S

Repeat

Evaluation : Evaluate the Gi (goodness) of all modules

Selection :

For each Vi (cell) DO

begin

if Random Rm > Gi then select the cell

End For

Allocation: For each selected Vi (cell) DO

begin

Move the cell to destination Block.

End For

Until Stopping criteria is satisfied.

Return best solution.

End

Simulated evolution Implementation.

• Cut goodness

• Power goodness

• Delay goodness

• The overall is a Fuzzy goodness.

31

Cut goodness

2

3

1

4

5

7 6

Partition 1 Partition 2

i

iii d

wdgc

33.03

235

gc

di: set of all nets, Connected and not cut.

wi : set of all nets, Connected and cut.

32

Power Goodness

2

3

1

4

5

0.2

0.1

0.2

7

0.3

6

0.4

0.1

Partition 1 Partition 2

  

Vi is the set of all nets connected and Ui is

the set of all nets connected and cut.

k

jIj

k

j

k

jijIj

i

VjS

UjSVjS

gp

1

1 1

428.07.0

4.07.05

gp

33

Delay Goodness

2

3

1

4

5

7 6

Partition 1 Partition 2

Q

QSET

CLR

D

Q

QSET

CLR

D

 Ki: is the set of cells in all paths passing by cell i.Li: is the set of cells in all paths passing by cell i and are not in same block as i.

i

iii K

LKgd

6.05

255

gd

4.05

354

gd

34

Final selection Fuzzy rule.IF Cell I is near its optimal Cut-set goodness as compared to other cells AND

AND

THEN it has a high goodness.

near its optimal net delay goodness as compared to other cells

OR T(max)(i) is much smaller than Tmax

 near its optimal power goodness compared to other cells

35

Tmax :delay of most critical path

in current iteration.

T(max)(i) :delay of longest path

traversing cell i.

Xpath= Tmax / T(max)(i)

iDiPiCiDiPiCii xg 3

11,,min)(

Fuzzy Goodness

Fuzzy Goodness:

iDiPiC ,, Respectively (Cutset, Power, Delay ) goodness.

36

Selection implementation

• Biasless selection scheme• The goodness distribution among the cells

is Guassian, with mean Gm and Standard deviation G .

• A random Guassian Rm number is generated with R .

• Eliminate having cells with zero selection probability.

37

Selection implementation

• Rm = Gm - G

• R = G

selection rule :

if Rm > Goodness (I) then select the cell.

38

Experimental Results

ISCAS 85-89 Benchmark Circuits

39

SimE Vs Ts Vs GA against time Circuit S13207

40

Experimental Results SimE Vs Ts Vs GA

SimE results were better than TS and GA, with faster execution time.

41

Thank you.

Questions?

top related