generating hardware designs by source code transformation

27
The Queen’s Tower The Queen’s Tower Imperial College London Imperial College London South Kensington, SW7 South Kensington, SW7 Generating Hardware Generating Hardware Designs by Source Code Designs by Source Code Transformation Transformation Ashley Brown, Wayne Luk, Paul Kelly STS ‘06

Upload: titania-lyris

Post on 03-Jan-2016

24 views

Category:

Documents


1 download

DESCRIPTION

Generating Hardware Designs by Source Code Transformation. Ashley Brown, Wayne Luk, Paul Kelly STS ‘06. What would we like to do?. Take an algorithm in written in C. Generate an efficient hardware design, run it on an FPGA. Fast design cycle, easy to maintain code. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Generating Hardware Designs by Source Code Transformation

The Queen’s TowerThe Queen’s TowerImperial College LondonImperial College LondonSouth Kensington, SW7South Kensington, SW7

Generating Hardware Generating Hardware Designs by Source Code Designs by Source Code

TransformationTransformation

Ashley Brown, Wayne Luk, Paul KellySTS ‘06

Page 2: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 2

What would we like to do?What would we like to do?

• Take an algorithm in written in C.

• Generate an efficient hardware design, run it on an FPGA.

• Fast design cycle, easy to maintain code.

• C programmers should be able to create fast hardware!

Page 3: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 3

Background: Handel-CBackground: Handel-C

• C-based programming language for digital system design.

• One clock-cycle per statement.

• Explicit parallelism.

• Compiler generates hardware design from Handel-C source.

while (j != 3) { par { t0 = aa[0] * bb[0]; t1 = aa[1] * bb[1]; } par { cc[i][j] = t0 + t1; j++; }}

Handel-C code example.

Page 4: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 4

ProblemsProblems

• Software programmers: Bad Handel-C, poor hardware.– No exploitation of statement-level parallelism.

– Long expressions.

– Lots of for loops!

• Experienced Handel-C designers: good hardware, hard to read code.– Trickery to reduce clock cycles, increase clock rate.

• Finding the “optimal” solution is not easy.– Optimisation effectiveness depends on the target

architecture (see the results later!)

Page 5: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 5

SolutionsSolutions

• Restructure Handel-C code to optimise.– Can parallelise if desired.– Duplicate hardware if necessary.

• Apply transformations to the original source, leaving it intact.– The original readable description is still available.– A more efficient version is used for hardware generation.

• Allow the user to define custom transformations with a transformation language.

• Generate a whole design-space of solutions, with different optimisations.

Page 6: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 6

What’s New?What’s New?

• Previous work with user-specified transformations has been:– For software-based C.

– Aimed at parallelising/optimising for microprocessors

• Can’t duplicate microprocessor hardware on the fly – it’s either there or not.We can duplicate hardware, pipeline – FASTER DESIGN!

• Previous work on hardware language transformations do not allow the user to describe transformations (Haydn-C).We do – the user can target their code explicitly.

• Exploring an entire design-space is usually done at the hardware level, not high-level language (although not always, e.g. ASC).We generate a full design-space – find *the* best solution.

Page 7: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 7

Basic ComponentsBasic Components

// 1 * x = x

std_times 1_elim {

pattern {

1 * cmlexpr(operand)

}

generate {

cmlexpr(operand

}

}

always transform

)

Wildcards, such as cmlexpr, allow a pattern to be matched and substituted

into the new tree

The generate section describes the code should

replace the pattern.

The pattern section describes the format of the

code to match for this transformation.

The optional always keyword indicates that this

transformation should always be applied where it

can.

Each transformation can have a name to identify it

for reporting.

CML transformations are defined within transform

blocks.

Wildcard matching:• cmlexpr - matches any expression• cmlstmt - matches any statement• cmlstmtlist - matches a list of statements

Wildcard matching:• cmlexpr - matches any expression• cmlstmt - matches any statement• cmlstmtlist - matches a list of statements

Page 8: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 8

Ensuring Data IntegrityEnsuring Data Integrity

• Three types of condition are defined to ensure data integrity:– Data-flow sets.

– Expression evaluation.

– Constant validation.

• Transformations have a conditions section to define these.

Page 9: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 9

Hand-coded vs AutomatedHand-coded vs Automated

Sequential Automated

do { if(A >= B) { A -= B; C = (C << 1) | 1; } else { C << 1; } B >>= 1; Bits--;}while(Bits != 0);

do { par { if(A >= B) { par { A -= B; C = (C << 1) | 1; } } else { C << 1; } B >>= 1; Bits--; }}while(Bits != 0);

do { par { if(A >= B) { par { A -= B; C = (C << 1) | 1; } } else { C = (C << 1); } B = (B >> 1); Bits = (Bits – 1); }}while(Bits != 0);

Hand-coded

Page 10: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 10

Test TransformationsTest Transformations

• Generic – applicable to all programs:– autopar – parallelise sequential statements with no

dependencies.

– fortowhile – convert for loops into corresponding while loops.

– lttoeq – convert for loops with < in the loop condition to ==.

• Application specific – targetted at the test programs:– matrixpar – parallelisation of an inner loop.

Page 11: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 11

More TransformationsMore Transformations

• Various mathematical rearrangments:– Factorise to reduce multiplies.

– Remove *1, *0, +0 etc.

• More interesting:– Dead-code elimination (remember data conditions!)

– Variable replacement• remove dependencies in code by replacing variables with the

expressions assigned to them last (again, remember data conditions!)

Page 12: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 12

Execution Time ImprovementExecution Time Improvement

Power/Execution Time Comparison at 50MHz

0

500

1000

1500

2000

2500

3000

3500

4000

base autopar fortowhile lttoeq matrixpar-

noshift

matrixpar-shift

Code Version

Exe

cution T

ime

(ns)

0

50

100

150

200

250

300

Dynam

ic P

ow

er E

stim

ate (m

W)

Execution Time

Dynamic Power Estimatelttoeq increases fmax on Altera, but decreases it on

Xilinx

lttoeq increases fmax on Altera, but decreases it on

Xilinx

Ex

ec

uti

on

Tim

e (s

)

Optimisation Applied (Optimisations are Cumulative)

Page 13: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 13

Design-Space ExplorationDesign-Space Exploration

• Difficult to decide which transformation is best.

• Don’t guess, produce several solutions.

• Branch the AST whenever a transformation is applied.– In-place branches: small AST.

– Propagate branches when no more transformations can be applied.

– Repeat transformation process on each new solution.

Page 14: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 14

Design Space ExplorationDesign Space Exploration

239

232139

97

98

99

100

101

102

103

104

105

0 50 100 150 200 250 300

Code Version

fmax

Page 15: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 15

Design Space ExplorationDesign Space Exploration

• Assume design with an fmax of 104MHz, must match that.

• Many solutions matching.– we should consider other factors such as area, power or

number of cycles.

• Being brief: look at solutions 139 and 232.

• Only partially parallelised. Solution with most parallelism (239) does not meet the fmax requirement.

Page 16: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 16

Future WorkFuture Work

• Extensions to the language to allow additional matching.

• expr replicator, complex expression matching.

• Preservation of structure – e.g. a++; does not become a = a + 1;

• Heuristics for selecting transformations to apply.

• Genetic algorithms for transformation selection? “Breed” good transformation solutions.

Page 17: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 17

Future ApplicationsFuture Applications

• Aspect-oriented concepts: automatically inserting debugging signals.

• Power-signature-masking code to avoid attacks in cryptographic applications.

Page 18: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 18

ConclusionConclusion

• Matching method can achieve good results on naïve C code.

• Targeting domain- or application-specific constructs can provide large performance gains at the expense of resources.

• Scope to produce a much more powerful system with changes to the transformation language, heuristics and more efficient algorithms.

Page 19: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 19

ContributionsContributions

• The first transformation language for parallelising hardware languages with data integrity conditions.

• A prototype transformation engine for implementing the language.

• Automatic transformations capable of achieving a 35-70% reduction in execution time.

• An insight into the interaction of transformations, both with each other and with the platform their output runs on.

Page 20: Generating Hardware Designs by Source Code Transformation

The Queen’s TowerThe Queen’s TowerImperial College LondonImperial College LondonSouth Kensington, SW7South Kensington, SW7

Questions?Questions?

Page 21: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 21

Cycle Count ImprovementsCycle Count Improvements

Program matmultinf

2dedge aes hist

base 189 4686 1948 3333

lttoeq 106 2818 1022 1794

% Decrease

44% 40% 48% 46%

Page 22: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 22

Design-Space ExplorationDesign-Space Exploration

Transform, creating a branch

point.

Page 23: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 23

Design-Space ExplorationDesign-Space Exploration

Propagate branches to root – create several distinct

solutions.

Page 24: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 24

[0] [1] [2] [3] [4] [5]

in/out

Conventional Array AccessConventional Array Access

Congestion

Page 25: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 25

Rotational Shift Array AccessRotational Shift Array Access

in/out

[0] [1] [2] [3] [4] [5]

Distributed Accesses

Page 26: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 26

Power/Execution Time Comparison at 50MHz

0

500

1000

1500

2000

2500

3000

3500

4000

base autopar fortowhile lttoeq matrixpar-

noshift

matrixpar-shift

Code Version

Exe

cuti

on

Tim

e (n

s)

0

50

100

150

200

250

300

Dy

nam

ic P

ow

er E

sti

mat

e (

mW

)

Execution Time

Dynamic Power Estimate

Page 27: Generating Hardware Designs by Source Code Transformation

21st June 2005 | Ashley Brown # 27

base

autopar

fortowhile

lttoeq

matrixpar

Comparison of Area with Execution Time

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

3000 3500 4000 4500 5000 5500 6000

Area

Exe

cu

tio

n T

ime