resource awareness fpga design practices for reconfigurable computing: principles and examples wu,...
TRANSCRIPT
Resource Awareness FPGA Design Practices for
Reconfigurable Computing: Principles and Examples
Wu, Jinyuan
Fermilab, PPD/EED
April 2007
Introduction• Short Course (1/2 day):
– “How to Design Compact FPGA Functions:
Resource awareness design practices.”
– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/CompactFPGAdesign.pdf
• Refresher Course (45min):– “Resource Saving in Micro-Computer Software &
FPGA Firmware Designs”
– http://www-ppd.fnal.gov/EEDOffice-W/Projects/ckm/comadc/ResourceSaving.ppt
• This Document– Resource Awareness FPGA Design Practices for
Reconfigurable Computing: Principles and Examples
What can be done with an
FPGA?
Example: ADC Using FPGA
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
AMP &Shaper
ADC
ADC
ADC
ADC
FPGA
TDC
TDC
TDC
TDC
R1 R1
C
R2
FPGA
VREF
• Analog signals from AMP & Shapers are directly fed to FPGA pins.
• FPGA outputs and passive RC network are used to generate ramping reference voltage VREF.
• The input voltages and VREF are compared using FPGA differential input receivers.
• The times of transitions representing input voltage values are digitized by TDC blocks in FPGA.
T1 T2 T3 T4
V1 V2V3 V4
V1 V2V3 V4
T1 T2 T3 T4
TDC Inside FPGA
c0
c90
c180
c270
c0
MultipleSampling
ClockDomain
Changing
Trans. Detection& Encode
Q0
Q1
Q2
Q3QF
QE
QD
c90
Coarse TimeCounter
DV
T0T1
TS
• Sampling rate: 360 MHz x4 phases = 1.44 GHz.
• LSB = 0.69 ns.
• Logic elements with critical timing are assigned as shown.
4Ch
Logic elements with non-critical timing are freely placed by the fitter of the compiler.
ADC Test: Waveform Digitization on BD3_19
1
1.5
2
2.5
2500 3000 3500 4000 4500 5000 5500
t(ns)
V
Leading Ramp Trailing Ramp
0
8
16
24
32
40
48
56
64
0 32 64 96 128 160 192 224 256
Leading Ramp Trailing Ramp
RawData
Input Waveform, Overlap Trigger& Reference Voltage
Converted
FPGA
TDC
TDC
50 50
1000pF
100
VREF
A lot can be done with an FPGA if one can image.
Micro-computing vs. Reconfigurable Computing
• In microprocessor, the users specify program on fixed logic circuits.
• In FPGA, the users specify logic circuits (as well as program).
• The FPGA computing needs not to follow microprocessor architectures. (But useful experiences can be borrowed.)
• The usefulness of FPGA reconfigurable computing is still to be fully appreciated.
(100+3-4)*5+7 =?
100
34
57Control:
Data: 100,3,4,5,7
LD (-) (+)(*)(+)
CPUFPGAData
ProgramConfiguration
DataProgram
Example: Track Fitting
z=z0(z-z0)=-2 (z-z0)=+2 (z-z0)=+4(z-z0)=-4
4h
y0-4
2000 )()( zzzzhyy
Relative Errors of Several Track Fitter Schemes
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
0 2 4 6 8 10 12 14 16 18
Track Half Length
Rel
ativ
e E
rro
rs
3-point, next planes
3-point, full length
FPGA fitter
Least Square
2000 )()( zzzzhyy
Least Square Fitter
Multiplier-less FPGA LS Fitter
Least Square Fitter
2000 )()( zzzzhyy
y1y2y3y4y5y6y7
iii
iii
iii
ye
ydh
ycy
0
c1
c2
c3
c4
c5
c6
c7
d1
d2
d3
d4
d5
d6
d7
e1
e2
e3
e4
e5
e6
e7
X
X
X
• The parameters can be described as inner-products.
• Hit coordinates and coefficients are fed simultaneously.
• The inner-products can be calculated with multiplier-accumulator structures.
Multiplier-less (ML) Quasi-Least Square Fitter
iii
iii
iii
ye
ydh
ycy
0
y1y2y3y4y5y6y7
x1x2x3x4x5x6x7
<<
+/- +/- +/-
<< <<
4
• The coefficients are described as “two-bit” numbers, e.g.:– 5=4+1; 7=8-1; 112=128-16;
• The multiplication is replaced with two shift & add/sub operations.
• There are two clock cycles to fetch a measurement point (i.e., y1, y2, etc.) allowing two shift & add/sub operations
+18-1
128-16
Inaccuracy Doesn’t Matter, A Lot of Time
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0 2 4 6 8 10 12 14 16 18
Half-length of the Track
Rel
ativ
e E
rro
r
eta4096 Least Square
eta4096 FPGA Fitter
hh512 Least Square
hh512 FPGA fitter
yy32 Least Square
yy32 FPGA fitter
Least Square Fitter
Multiplier-lessQuasi-Least Square
FPGA Fitter
2000 )()( zzzzhyy
Fitting is easy. Matching hits is harder.Software FPGA
Typical
FPGA Resource Saving Approaches
O(n2)for(){
for(){…}
}
O(n)*O(N)Comparator
Array
Hash Sorter
O(n)*O(N): in RAM
O(n3)for(){
for(){
for(){…}
}
}
O(n)*O(N2)CAM,
Hugh Trans.
Tiny Triplet Finder
O(n)*O(N*logN)
O(n4)for(){ for(){
for(){ for()
{…}
}}}
Resource Saving Tricks
Loop Reduction Tricks:The number of computations in a given task is reduced by (1) using fewer iterations in loops or/and (2) using fewer operations in each iteration.
Non-Loop Reduction Tricks:The number of computations in a given task is unchanged. The FPGA resource is saved by (1) reusing the resources multiple times via sequencing or/and (2) using transistor-saving resources such as RAM.
Resource Saving TricksLoop-Reduction
Multiplier-less (ML) Approaches
Recursive Implementation of FIR Filter
FFT: O(n)*O(log(N))
Tiny Triplet Finder: O(n)*O(N*log(N))
+
s[n]
-x[n-K]
x[n]
+y[n]
-s[n-K]
x[n]
y[n]
*h1*h2
*h[K]
X
<<
+/-
*R1/R3
*R2/R3
Bit
Arr
ay
Shifter
Bit
Arr
ay
ShifterBit-wise Coincident Logic
Resource Saving TricksNon-Loop-Reduction
Sequencing: Using RAM: Hash Sorter/Histogram
OP1
Initialization
OP2 OP3 OP4
OP1 OP2 OP3 OP4
OP1 OP2 OP3 OP4
OP1 OP2 OP3 OP4
Initialization 1Initialization 2Initialization 3
OP1OP2OP3OP4
OP1OP2OP3OP4
OP1OP2OP3OP4
OP1OP2OP3OP4
InputCtrl
De-serial.
BCO
Hit(s)
D
W/RWA
RA
16
32
An Example of Inexplicit Computing & Hidden Resource
• Data with random time stamp are re-ordered according to beam crossing (BCO).
• Data with same BCO output together and the bandwidth becomes smaller.
• Inexplicit computing (sorting) is performed with hidden resource (RAM, it should be static RAM not dynamic RAM.)
RAM
Why Saving Resource?
Why not?
The Fever of Moore’s Law vs. Maxwell’s Equations
t
DJH
t
BE
B
D
0
1998 2000 2002 2004 2006 2008 2010
Op/sec
MIT, 2002
• During the hot days of Moore’s Law, the rules of thumb are: – BRB – Buy Rather than Build
– URU – Use Rather than Understand
– WRW – Wait Rather than Work
• From fundamental principles like Maxwell’s Equations, it is known limits of Moore’s Law exist. The technology advance should come from: – The I3 Law: Imagination, Innovation & Implementation.
WRW
Total Useful Works = (Clock Frequency)
x (Silicon Size) x (Efficiency)
• There is a big room for improvement on computation efficiency in both micro-computer software and FPGA firmware.
• Resource awareness not only saves direct cost, but also indirect cost like power consumption, PC board layout, cooling etc.
• Unnecessary artificial complexities confuse people, often including the designer.• Resource saving helps today when technology stales.• Resource saving helps future with technology progresses.
E
F
S
E
F
S
Primarily Users’Responsibility