resource awareness fpga design practices for reconfigurable computing: principles and examples wu,...

Post on 02-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Resource Awareness FPGA Design Practices for

Reconfigurable Computing: Principles and Examples

Wu, Jinyuan

Fermilab, PPD/EED

April 2007

Introduction• Short Course (1/2 day):

– “How to Design Compact FPGA Functions:

Resource awareness design practices.”

– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/CompactFPGAdesign.pdf

• Refresher Course (45min):– “Resource Saving in Micro-Computer Software &

FPGA Firmware Designs”

– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/ResourceSaving.ppt

• This Document– Resource Awareness FPGA Design Practices for

Reconfigurable Computing: Principles and Examples

What can be done with an

FPGA?

Example: ADC Using FPGA

AMP &Shaper

AMP &Shaper

AMP &Shaper

AMP &Shaper

AMP &Shaper

AMP &Shaper

AMP &Shaper

AMP &Shaper

ADC

ADC

ADC

ADC

FPGA

TDC

TDC

TDC

TDC

R1 R1

C

R2

FPGA

VREF

• Analog signals from AMP & Shapers are directly fed to FPGA pins.

• FPGA outputs and passive RC network are used to generate ramping reference voltage VREF.

• The input voltages and VREF are compared using FPGA differential input receivers.

• The times of transitions representing input voltage values are digitized by TDC blocks in FPGA.

T1 T2 T3 T4

V1 V2V3 V4

V1 V2V3 V4

T1 T2 T3 T4

TDC Inside FPGA

c0

c90

c180

c270

c0

MultipleSampling

ClockDomain

Changing

Trans. Detection& Encode

Q0

Q1

Q2

Q3QF

QE

QD

c90

Coarse TimeCounter

DV

T0T1

TS

• Sampling rate: 360 MHz x4 phases = 1.44 GHz.

• LSB = 0.69 ns.

• Logic elements with critical timing are assigned as shown.

4Ch

Logic elements with non-critical timing are freely placed by the fitter of the compiler.

ADC Test: Waveform Digitization on BD3_19

1

1.5

2

2.5

2500 3000 3500 4000 4500 5000 5500

t(ns)

V

Leading Ramp Trailing Ramp

0

8

16

24

32

40

48

56

64

0 32 64 96 128 160 192 224 256

Leading Ramp Trailing Ramp

RawData

Input Waveform, Overlap Trigger& Reference Voltage

Converted

FPGA

TDC

TDC

50 50

1000pF

100

VREF

A lot can be done with an FPGA if one can image.

Micro-computing vs. Reconfigurable Computing

• In microprocessor, the users specify program on fixed logic circuits.

• In FPGA, the users specify logic circuits (as well as program).

• The FPGA computing needs not to follow microprocessor architectures. (But useful experiences can be borrowed.)

• The usefulness of FPGA reconfigurable computing is still to be fully appreciated.

(100+3-4)*5+7 =?

100

34

57Control:

Data: 100,3,4,5,7

LD (-) (+)(*)(+)

CPUFPGAData

ProgramConfiguration

DataProgram

Example: Track Fitting

z=z0(z-z0)=-2 (z-z0)=+2 (z-z0)=+4(z-z0)=-4

4h

y0-4

2000 )()( zzzzhyy

Relative Errors of Several Track Fitter Schemes

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00

0 2 4 6 8 10 12 14 16 18

Track Half Length

Rel

ativ

e E

rro

rs

3-point, next planes

3-point, full length

FPGA fitter

Least Square

2000 )()( zzzzhyy

Least Square Fitter

Multiplier-less FPGA LS Fitter

Least Square Fitter

2000 )()( zzzzhyy

y1y2y3y4y5y6y7

iii

iii

iii

ye

ydh

ycy

0

c1

c2

c3

c4

c5

c6

c7

d1

d2

d3

d4

d5

d6

d7

e1

e2

e3

e4

e5

e6

e7

X

X

X

• The parameters can be described as inner-products.

• Hit coordinates and coefficients are fed simultaneously.

• The inner-products can be calculated with multiplier-accumulator structures.

Multiplier-less (ML) Quasi-Least Square Fitter

iii

iii

iii

ye

ydh

ycy

0

y1y2y3y4y5y6y7

x1x2x3x4x5x6x7

<<

+/- +/- +/-

<< <<

4

• The coefficients are described as “two-bit” numbers, e.g.:– 5=4+1; 7=8-1; 112=128-16;

• The multiplication is replaced with two shift & add/sub operations.

• There are two clock cycles to fetch a measurement point (i.e., y1, y2, etc.) allowing two shift & add/sub operations

+18-1

128-16

Inaccuracy Doesn’t Matter, A Lot of Time

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

0 2 4 6 8 10 12 14 16 18

Half-length of the Track

Rel

ativ

e E

rro

r

eta4096 Least Square

eta4096 FPGA Fitter

hh512 Least Square

hh512 FPGA fitter

yy32 Least Square

yy32 FPGA fitter

Least Square Fitter

Multiplier-lessQuasi-Least Square

FPGA Fitter

2000 )()( zzzzhyy

Fitting is easy. Matching hits is harder.Software FPGA

Typical

FPGA Resource Saving Approaches

O(n2)for(){

for(){…}

}

O(n)*O(N)Comparator

Array

Hash Sorter

O(n)*O(N): in RAM

O(n3)for(){

for(){

for(){…}

}

}

O(n)*O(N2)CAM,

Hugh Trans.

Tiny Triplet Finder

O(n)*O(N*logN)

O(n4)for(){ for(){

for(){ for()

{…}

}}}

Resource Saving Tricks

Loop Reduction Tricks:The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration.

Non-Loop Reduction Tricks:The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.

Resource Saving TricksLoop-Reduction

Multiplier-less (ML) Approaches

Recursive Implementation of FIR Filter

FFT: O(n)*O(log(N))

Tiny Triplet Finder: O(n)*O(N*log(N))

+

s[n]

-x[n-K]

x[n]

+y[n]

-s[n-K]

x[n]

y[n]

*h1*h2

*h[K]

X

<<

+/-

*R1/R3

*R2/R3

Bit

Arr

ay

Shifter

Bit

Arr

ay

ShifterBit-wise Coincident Logic

Resource Saving TricksNon-Loop-Reduction

Sequencing: Using RAM: Hash Sorter/Histogram

OP1

Initialization

OP2 OP3 OP4

OP1 OP2 OP3 OP4

OP1 OP2 OP3 OP4

OP1 OP2 OP3 OP4

Initialization 1Initialization 2Initialization 3

OP1OP2OP3OP4

OP1OP2OP3OP4

OP1OP2OP3OP4

OP1OP2OP3OP4

InputCtrl

De-serial.

BCO

Hit(s)

D

W/RWA

RA

16

32

An Example of Inexplicit Computing & Hidden Resource

• Data with random time stamp are re-ordered according to beam crossing (BCO).

• Data with same BCO output together and the bandwidth becomes smaller.

• Inexplicit computing (sorting) is performed with hidden resource (RAM, it should be static RAM not dynamic RAM.)

RAM

Why Saving Resource?

Why not?

The Fever of Moore’s Law vs. Maxwell’s Equations

t

DJH

t

BE

B

D

0

1998 2000 2002 2004 2006 2008 2010

Op/sec

MIT, 2002

• During the hot days of Moore’s Law, the rules of thumb are: – BRB – Buy Rather than Build

– URU – Use Rather than Understand

– WRW – Wait Rather than Work

• From fundamental principles like Maxwell’s Equations, it is known limits of Moore’s Law exist. The technology advance should come from: – The I3 Law: Imagination, Innovation & Implementation.

WRW

Total Useful Works = (Clock Frequency)

x (Silicon Size) x (Efficiency)

• There is a big room for improvement on computation efficiency in both micro-computer software and FPGA firmware.

• Resource awareness not only saves direct cost, but also indirect cost like power consumption, PC board layout, cooling etc.

• Unnecessary artificial complexities confuse people, often including the designer.• Resource saving helps today when technology stales.• Resource saving helps future with technology progresses.

E

F

S

E

F

S

Primarily Users’Responsibility

top related