hardware implementation of a memetic algorithm for vlsi circuit layout stephen coe msc engineering...

19
Hardware Implementation of Hardware Implementation of a Memetic Algorithm for a Memetic Algorithm for VLSI Circuit Layout VLSI Circuit Layout Stephen Stephen Coe Coe MSc Engineering Candidate MSc Engineering Candidate Advisor Advisor s: s: Dr. Shawki Dr. Shawki Areibi Areibi Dr. Medhat Dr. Medhat Moussa Moussa

Upload: gary-caldwell

Post on 02-Jan-2016

225 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Hardware Implementation of a Hardware Implementation of a Memetic Algorithm for VLSI Circuit Memetic Algorithm for VLSI Circuit

LayoutLayout

Stephen Stephen CoeCoe

MSc Engineering CandidateMSc Engineering Candidate

Advisors:Advisors: Dr. Shawki AreibiDr. Shawki AreibiDr. Medhat MoussaDr. Medhat Moussa

Page 2: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Topic OverviewTopic Overview

IntroductionIntroduction BackgroundBackground

– Circuit Partitioning (CP)Circuit Partitioning (CP)– Handel-C vs. VHDLHandel-C vs. VHDL– Memetic AlgorithmMemetic Algorithm

Research ChallengesResearch Challenges Hardware ApproachHardware Approach Current Status and Future WorkCurrent Status and Future Work

Page 3: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

IntroductionIntroduction

Today's technology allows for Today's technology allows for billions of transistors to be billions of transistors to be implemented into a single circuitimplemented into a single circuit

As these transistors become As these transistors become smaller, the interconnect delay smaller, the interconnect delay is the limiting factor in computer is the limiting factor in computer execution speedsexecution speeds

These factors place an These factors place an increasing importance on CAD increasing importance on CAD tools to minimizing this tools to minimizing this interconnect lengthinterconnect length

As FPGAs become larger and As FPGAs become larger and faster, new methods for faster, new methods for improving algorithm improving algorithm performance become availableperformance become available

2.0 µ 1.5 µ 1.0 µ 0.8 µ 0.5 µ 0.35 µ

0.1

1.0

10

Dela

y (

ns)

Minimum Feature Size

TypicalGate Delay

InterconnectDelay

Page 4: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Circuit PartitioningCircuit Partitioning Method of splitting complex designs into smaller Method of splitting complex designs into smaller

subsystemssubsystems Attempts to minimize the connection between subsystems Attempts to minimize the connection between subsystems The objective is to maximize the number of uncut netsThe objective is to maximize the number of uncut nets

– The longer the interconnects between modules, the longer the The longer the interconnects between modules, the longer the delay within the circuitdelay within the circuit

M0M0 M2M2 M4M4 M3M3M1M1 M5M5

Net 5Net 1

Net 2Net 3

Net 4

Page 5: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Development ToolsDevelopment ToolsCeloxica DK Design SuiteCeloxica DK Design Suite

High-level language based on ISO/ANSI-C for the implementation of High-level language based on ISO/ANSI-C for the implementation of algorithms in hardwarealgorithms in hardware

Allows software engineers to design hardware without retrainingAllows software engineers to design hardware without retraining Can generate VHDL code or a EDIF fileCan generate VHDL code or a EDIF file Support for many Actel, Altera and Xilinx devices Support for many Actel, Altera and Xilinx devices Uses second-party Placement and Routing programs to generate bit filesUses second-party Placement and Routing programs to generate bit files

Handel C Source Files

Compile

GenerateEDIF (netlist)

GenerateVHDL/Verilog

Simulate & netlist

Place & RouteTools

GenerationBitStream

Design FlowDesign Flow

Page 6: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Similarities of Handel-C & ISO CSimilarities of Handel-C & ISO C SimilaritiesSimilarities

– #define, #ifdef, etc.#define, #ifdef, etc.– Casting different Variable typesCasting different Variable types– Function Declarations are the sameFunction Declarations are the same– Registers stored as variables (eg. int, unsigned, etc)Registers stored as variables (eg. int, unsigned, etc)– for, while and do loopsfor, while and do loops

DifferencesDifferences– No float, double in Handel-CNo float, double in Handel-C– Variables in Handel-C are of undefined widthsVariables in Handel-C are of undefined widths– No Recursive Function CallsNo Recursive Function Calls– Incline functions generate totally new hardwareIncline functions generate totally new hardware– No malloc, free (Hardware cannot make dynamic No malloc, free (Hardware cannot make dynamic

memorymemory– Data can be read in for simulation onlyData can be read in for simulation only– Parallelism existsParallelism exists

Page 7: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Memory is access as a array

Type of memory is easily distinguishable

Memory of Handel-CMemory of Handel-CMemory Access AdvantageMemory Access Advantage

Memory Data is access within 1 Clock•No specific timing requiredNo specific timing required

•Block RamBlock Ram•External RamExternal Ram•Logic RamLogic Ram

Memory Access DisadvantageMemory Access Disadvantage

•MemoryData[1024] = WriteData;MemoryData[1024] = WriteData;•Allows Multi-Dimensional Memory AccessAllows Multi-Dimensional Memory Access

Divides operating clock frequency by 4

External Clock

Handel-C Clock

Write Enable

Data

Page 8: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Parallel Execution In Handel-CParallel Execution In Handel-C

Parallel Executionpar{ } Command

Clock 1

Clock 2 Clock 2

Clock 3

Clock 4

Wait

Waiting for right Waiting for right execution to finishexecution to finish

Channel Communication

Allows parallel component to talk to each other

ChannelChannel

Page 9: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Memetic AlgorithmMemetic AlgorithmA genetic/evolutionary algorithm which includes a

non-genetic local search to improve solution

Genetic AlgorithmGenetic Algorithm– Population based heuristic Population based heuristic

technique based on the technique based on the biological reproductive systembiological reproductive system

– Operates on the theory of Operates on the theory of “survival of the fittest”“survival of the fittest”

– Good at exploring the solution Good at exploring the solution spacespace

Local SearchLocal Search– Iterative improvement Iterative improvement

algorithmsalgorithms– Often get trapped in sub-Often get trapped in sub-

optimum solutionsoptimum solutions– Good at exploiting the Good at exploiting the

solution spacesolution space– Success is dependent on Success is dependent on

good starting solutionsgood starting solutions

Page 10: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Not Global Minimum

Genetic AlgorithmGenetic AlgorithmLocal SearchLocal Search

Page 11: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Research ChallengesResearch Challenges

Memetic AlgorithmsMemetic Algorithms– Increase computational performance of Increase computational performance of

Algorithm (CPU Time)Algorithm (CPU Time)– Exploit the inherent parallel nature of Genetic Exploit the inherent parallel nature of Genetic

AlgorithmsAlgorithms

Hardware Development LanguagesHardware Development Languages– Determine the impact of High level Languages Determine the impact of High level Languages

vs Low level Languagesvs Low level Languages

Page 12: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

ApproachApproach

Explore the most efficient design to implement Explore the most efficient design to implement memetic algorithms on single FPGA chipmemetic algorithms on single FPGA chip

Achieve increased performance through pipelining Achieve increased performance through pipelining and parallelizationand parallelization– Divide the tasks into separate but concurrent components Divide the tasks into separate but concurrent components

FPGA Chip

Different Tasks of algorithm

Page 13: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Genetic Algorithm in HardwareGenetic Algorithm in Hardware

CrossoverModule

Selection Module

MutationModule

MutationModule

RepairModule

RepairModule

FitnessModule

Replacement

FitnessModule

Offspring 1Offspring 1

Offspring 2Offspring 2Crossover

ModuleSelection

Module

MutationModule

MutationModule

RepairModule

RepairModule

FitnessModule

Replacement

FitnessModule

Offspring 1Offspring 1

Offspring 2Offspring 2

CrossoverModule

Selection Module

ReplacementMutationModule

RepairModule

FitnessModule

(Pipelined Approch)(Pipelined Approch)

CrossoverModule

Selection Module

ReplacementMutationModule

RepairModule

FitnessModule

CrossoverModule

Selection Module

MutationModule

RepairModule

FitnessModule

CrossoverModule

Selection Module

MutationModule

RepairModule

Offspring 1Offspring 1 Offspring 1Offspring 1 Offspring 1Offspring 1 Offspring 1Offspring 1 Offspring 1Offspring 1 Offspring 1Offspring 1Offspring 2Offspring 2 Offspring 2Offspring 2 Offspring 2Offspring 2 Offspring 2Offspring 2 Offspring 2Offspring 2Offspring 3Offspring 3 Offspring 3Offspring 3 Offspring 3Offspring 3 Offspring 3Offspring 3

Page 14: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Local Search AlgorithmLocal Search Algorithm

M0M0 M2M2M1M1 M5M5 M4M4 M3M3

Net 4

Net 5Net 1

Net 2Net 3

0 1 2 3 4 5

0110 10

Block 1Block 0

0

Objective Value =

(Uncut Nets)

23

Module Data

0 10

010 Block 1

Block 01 2 3 4 5

0 0

0 0

11

0

(forcing specific nets within one block)

Page 15: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Sequential issuesSequential issuesSelect Next

Move

Copy Solution

Loop1

Loop2

Loop3

Loop1

Loop2

Loop3

Loop1

Loop2

Loop3

Loop1

Loop2

Loop3

Block RamBlock Ram Block RamBlock Ram

UpdateNet Info

Page 16: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Preliminary Results of GAPreliminary Results of GASoftware Results (Sun Blade 1000)

107.6

Benchmark Modules Nets Best Worst Mean Std Dev Time

prim1.dat

prim2.dat

struct.dat

ind1.dat

pcb1.dat

chip1.dat

chip4.dat

fract.dat

833

3014

1952

2271

24

300

224

149

902

3029

1920

2192

32

294

221

147

795.4

2580.6

1713.2

1947.6

25

253.2

186.6

767.2

2504.4

1671.2

1887.8

19.2

241.2

175.4

96.2

786.4

2546.6

1694.6

1919.6

24.7

251.1

184.6

107.4

5.642

14.539

8.252

12.134

1.073

2.703

2.361

2.480

30.6

122.1

73.1

87.9

0.8

8.4

6.6

4.3

Quality

Hardware Results (@ 59MHz / 4)

116.6

Benchmark Modules Nets Best Worst Mean Std Dev Time

prim1.dat

prim2.dat

struct.dat

ind1.dat

pcb1.dat

chip1.dat

chip4.dat

fract.dat

833

3014

1952

2271

24

300

224

149

902

3029

1920

2192

32

294

221

147

661.4

1732.0

1275.4

1415.0

25.2

230.8

188.8

645.2

1703.0

1246.8

1390.0

22.0

221.2

182.0

112.0

657.2

1723.8

1266.2

1407.8

25.2

229.8

188.2

116.3

3.775

7.041

6.705

6.138

0.333

1.883

1.316

0.661

10.3

33.0

21.4

23.8

0.3

3.4

2.5

1.7

Speedup

290%

370%

342%

369%

266%

247%

264%

253%

-16.8%

-32.8%

-25.5%

-27.3%

0.8%

-8.8%

1.1%

8.4%

Page 17: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Handel-C vs VHDLHandel-C vs VHDL For Local Search Designs For Local Search Designs

42,19242,19242,89842,898Total equivalent gateTotal equivalent gate

Handel-CHandel-CVHDL PrototypeVHDL Prototype

Handel-CHandel-CVHDL PrototypeVHDL Prototype

1/4 (25%)1/4 (25%)

3,349/24,576 (13%)3,349/24,576 (13%)

2,193/24,576 (8%)2,193/24,576 (8%)

2,204/12,288 (17%)2,204/12,288 (17%)

11.612 ns11.612 ns

15.768 ns15.768 ns

2.921 ns2.921 ns

Number of GCLKsNumber of GCLKs

Number of 4 input LUTsNumber of 4 input LUTs

Number of Slice Number of Slice RegistersRegisters

Number of SlicesNumber of Slices

Usage SummaryUsage Summary

Average Delay on the 10 Worst NetsAverage Delay on the 10 Worst Nets

Maximum DelayMaximum Delay

Average Connection Delay for this designAverage Connection Delay for this design

SpeedSpeed

2/4 (50%)2/4 (50%)

3,333/24,576 (13%)3,333/24,576 (13%)

1,709/24,576 (6%)1,709/24,576 (6%)

2,573/12,288 (20%)2,573/12,288 (20%)

11.309 ns11.309 ns

11.979 ns11.979 ns

2.775 ns2.775 ns

(xcv1000-4bg560)

Page 18: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Current Status and Future WorkCurrent Status and Future Work Current StatusCurrent Status

– Completed VHDL Local Search PrototypeCompleted VHDL Local Search Prototype Verified through simulationVerified through simulation

– Completed Handel-C Local Search DesignCompleted Handel-C Local Search Design Verified and implemented on RC1000Verified and implemented on RC1000

– Completed Handel-C Genetic Algorithm DesignCompleted Handel-C Genetic Algorithm Design Currently in testing stagesCurrently in testing stages

Future WorkFuture Work– Complete VHDL Local Search Design and ImplementationComplete VHDL Local Search Design and Implementation– Analyze the performance difference between Hardware Analyze the performance difference between Hardware

based Memetic algorithm and Software algorithmbased Memetic algorithm and Software algorithm

Page 19: Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Hardware Implementation of a Hardware Implementation of a Memetic Algorithm for VLSI Circuit Memetic Algorithm for VLSI Circuit

LayoutLayout

Stephen Stephen CoeCoe

MSc Engineering CandidateMSc Engineering Candidate

Advisors:Advisors: Dr. Shawki AreibiDr. Shawki AreibiDr. Medhat MoussaDr. Medhat Moussa