search space properties for pipelined fpga applications

Post on 15-Jan-2016

34 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Search Space Properties for Pipelined FPGA Applications. University of Southern California Information Sciences Institute Heidi Ziegler, Mary Hall, Byoungro So Oct 2, 2003. Mapping Assignment. Partition Chip Capacity. Compute Data Layout. Manage Communication. - PowerPoint PPT Presentation

TRANSCRIPT

USCUSC

Search Space Properties Search Space Properties for Pipelined FPGA Applicationsfor Pipelined FPGA Applications

University of Southern CaliforniaInformation Sciences Institute

Heidi Ziegler, Mary Hall, Byoungro So

Oct 2, 2003

2

USCUSCMapping AssignmentMapping Assignment

Machine V ision K ernel(application requirem ents)

1 . Edge detection 2 . Feature extraction 3 . Distance com putation

FP GA

M ap

3

USCUSC

Machine V ision K ernel(MV I S)

1 . Edge detection 2 . Feature extraction 3 . Distance com putation

configurable logic element

off-chip memory

datapath

on-chip storage

I nterconnect

configurable logic

Mapping an Application to HardwareMapping an Application to Hardware

1

2

3

Compute Data Layout

Partition Chip Capacit

y

Manage Communicati

on

4

USCUSCBuild on Prior Work in Build on Prior Work in DEFACTODEFACTO

Automatic design space exploration for individual loop nests (DAC03, PLDI02)

Analyses and transformations to exploit ILP (PLDI02) and maximize memory bandwidth (LPCP02)

Communication and pipeline analysis to exploit data and task parallelism (FCCM02, DAC03)

C

Analyses and T ransformations

SU I F to VHDL

Behavioral Synthesis and Estimation

Good Design?

Logic Synthesis and P lace&Route

N o

Yes

5

USCUSCThis ResearchThis Research

Integrates communication and pipelining analysis with the single loop design space exploration

Defines and illustrates search space properties for the global optimization problem

Describes a search algorithm and presents a case study

6

USCUSCSequential MVIS KernelSequential MVIS Kernel

ReadWriteExecution Order

Time

AB

2-D array

access order row-wise

data dependen

ce

B

RAW

Edge

Feature

Distance

F

RAW

D

D

Pipeline Stage S1

Pipeline Stage S2

Pipeline Stage S3

7

USCUSCReaching Definition Data Access DescriptorReaching Definition Data Access Descriptor

Set describes basic data access information

s program pointr, w read or write array access

accessed array section, integer linear inequalities

traversal order, vector of dims., slowest to fastest

vector of dominant induction variables for ea. dim

set of statements this tuple describes (def or use)

set of reaching definitions

)(},,{ ARDAD swr

8

USCUSCCommunication RequirementsCommunication Requirements

Read (4)Write (3)

)(, ,, BRDADBRDADf sjrsiw

Stage S2

Stage S1

|3,2,129202910

)(1, yxdd

BRDAD sw

3|4,2,129202910

)(2, yxdd

BRDAD sr

B

B

Communication

RAW

Solve directly for data, granularity, placement

9

USCUSCTask GraphTask Graph Nodes are pipeline stages Communication edge descriptors (CEDs) computed from

RDADs

array section, per communication instance send point receive point

S 1

S 5

S 2

S 4

S 3

{R D AD s}s2

{R D AD s}s1

{R D AD s}s4

{R D AD s}s5

{R D AD s}s3

CE D s2 -> s3 (a )ra te s2 (a )p ro d

ra te s3 (a ) c o n s

CE D s2 -> s3 (b )rate s2 (a ) p ro d

ra te s3 (a ) c o n s

CE D s1 -> s2 (a )ra te s1 (a ) p ro d

ra te s2 (a ) c o n s

CE D s1 -> s5 (a )ra te s1 (a ) p ro d

ra te s5 (a ) c o n s

CE D s1 -> s4 (x)ra te s1 (x)p ro d

ra te s4 (x) c o n s

CE D s4 -> s5 (y)ra te s4 (y)p ro d

ra te s5 (y) c o n s

CE D s5 -> s3 (y)ra te s5 (y)p ro d

ra te s3 (y) c o n s

)(ACED ji ss

10

USCUSCGlobal Optimization StrategyGlobal Optimization Strategy

2 Criteria Design’s execution time should be

minimized Design’s space utilization, for a given level

of performance, should be minimized

Estimates Behavioral synthesis area (all loops) Behavioral synthesis timing (all loops) Communication rates

11

USCUSCTransformationsTransformations

Local Unroll and jam Scalar replacement Custom data layout

Global Communication granularity and

placement Producer-Consumer Rate Matching Data reorganization on-chip

12

USCUSCHigh-Level Design FlowHigh-Level Design FlowC

Communication and P ipeline Analysis

Custom Data Layout

SU I F to VHDL

Behavioral Synthesis and Estimation

Basic Compiler O ptimizations

Scalar Replacement

Unro ll and J am

Producer-Consumer Rate M atching

Communication Granularity Analysis

Logic Synthesis / P lace & Route

G ood D esig n ? N o

Y es

Con fig u ration B it S tream

13

USCUSCObservation 1: Observation 1: Non-increasing Memory AccessesNon-increasing Memory Accesses

Choose to place communication on-chip

off-chip memory

configurable logic device

Stage 1 AABB

Stage 2

S1

S2BB

DD

EE

BB

AA

Single Loop So lution Global So lution

DD

EE

14

USCUSCObservation 2: Observation 2: Non-increasing Unroll FactorNon-increasing Unroll Factor

Local solution assumed to be best-case performance, worst-case space estimate

Stage 1

S1

S2

Single Loop So lution Global So lution

Stage 2Reduce unroll factors

15

USCUSCObservation 3:Observation 3:Matching Rates without Affecting PerformanceMatching Rates without Affecting Performance

Avoid creating longer critical paths

S 1

S 3

S 2

If rateprod(d) < ratecons(d),we can safely reduce the unroll factor for S3

until the rates match

CED(d)rateprod(d)ratecons(d)

CED(a)rateprod(a)ratecons(a)

16

USCUSCOptimization Algorithm: Step Optimization Algorithm: Step 11

S 1

S 3

S 2

peak

feat

u re_

x

CE D s1 , s2 (p eak)

CE D s2 , s3 ( featu re_ x)

R D AD w ,s1 (p eak)

R D AD r,s2 (p eak)R D AD w ,s2 ( featu re_ x)

R D AD r,s3 ( featu re_ x)R D AD w ,s3 (ssd )

R D AD r,s1 (u )

R D AD r,s3 (u )R D AD r,s3 (v)

R D AD w ,s2 ( featu re_ y)

Apply Pipeline and Communication Analysisfor (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

uh1 = -3*u[x][y] – 3*u[x+1][y]……;

uh2 = -3*u[x][y] +3*u[x+1][y] …..;

peak[x][y] = uh1 + uh2;

}

}

for (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

if (feature_x[x][y] !=0)

ssd[x][y] = (u[x][y]-v[x][y+1])2 ……….

}

}

for (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

if (peak[x][y] > threshold)

feature_x[x][y] = x;

else feature_x[x][y] = 0;

}

}

17

USCUSCOptimization Algorithm: Step Optimization Algorithm: Step 22

Stage 1

Stage 2

Stage 3

S et o f U n ro ll F actors

S et o f U n ro ll F actors

S et o f U n ro ll F actors

peak

fea t

u re_

x

Find Single Loop Solutions in Isolationfor (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

uh1 = -3*u[x][y] – 3*u[x+1][y]……;

uh2 = -3*u[x][y] +3*u[x+1][y] …..;

peak[x][y] = uh1 + uh2;

}

}

for (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

if (feature_x[x][y] !=0)

ssd[x][y] = (u[x][y]-v[x][y+1])2 ……….

}

}

for (x=0;x<image-2;x++) {

for (y=0;y<image-2;y++) {

if (peak[x][y] > threshold)

feature_x[x][y] = x;

else feature_x[x][y] = 0;

}

}

18

USCUSCOptimization Algorithm: Optimization Algorithm: Step 3Step 3

Match Producer and Consumer Rates

S 1

S 3

S 2

CED(feature_x)rateprod(feature_x)ratecons(feature_x)

CED(peak)rateprod(peak)ratecons(peak)

rateprod(peak) = ratecons(peak)

rateprod(feature_x) = ratecons(feature_x)

19

USCUSCOptimization Algorithm: Step Optimization Algorithm: Step 44

Apply Greedy Strategy to Meet Chip Constraint

Stage 1

Stage 2

Stage 3

inareacapacity 1

If not, apply greedy strategy and then repeat steps 3 and 4.

Final Solution

20

USCUSCRelated WorkRelated Work Synthesizing high-level constructs

Handel-C, RaPiD, PipeRench, Babb et al.

Design space exploration Derrien/Rajopadhye, Cameron, PICO

Program analysis on arrays Hall et. al, Amarasinghe, Balasundaram &

Kennedy

Pipeline analysis Splash 2, Weinhardt & Luk, Du et. al, Goldstein et

al.

21

USCUSCConclusionConclusion

System-level compiler automatically derives a pipelined implementation with explicit communication, while partitioning the chip capacity among pipeline stages

Global optimization strategy Built upon local solution with communication

Constrain the search space Non-increasing memory accesses Non-increasing unroll factors

22

USCUSCContact InformationContact Information

Project Web Site

www.isi.edu/asd/defacto

Authors’ email addresses

ziegler, mhall, bso@isi.edu

top related