optimal configuration of combined gpp/dsp/fpga systems …antonio/pubs/p-fall97acs.pdfgpp/dsp/fpga...
TRANSCRIPT
1
Optimal Configuration of Combined GPP/DSP/FPGA Systems for
Minimal SWAPby
John K. AntonioDepartment of Computer Science
College of EngineeringTexas Tech University
Fall ACS MeetingNovember 4-6, 1997
2
OutlineOutline
• Program Objectives and Schedule of Milestones
• Representative Examples of Current Work
• Competing STAP Weight Solvers
• Power Prediction Model for FPGAs
• Optimal Configuration for SAR Processing
• Questions/Answers
3
Program Objectives
• Demonstrate advantages of combined use of GPP, DSP, and FPGA technologies for SAR and STAP applications
• Demonstrate advantages/disadvantages of different FPGA designs and implementations in terms of power consumption and real-estate requirements
• Develop and evaluate power prediction models for a GPP/DSP/FPGA prototype system
• Development of formal optimizations for configuring GPP/DSP/FPGA systems
4
Program Objectives
• Incorporation of data characteristics and requirements in optimizing system configuration– dynamic range– numerical accuracy
• Incorporation of multiple GPP/DSP algorithms, FPGA designs and implementations, and data representations in optimizing system configuration– Time-domain vs. frequency-domain convolutions– QR vs. conjugate gradient STAP weight solver – Fixed-point vs. block floating point vs. floating point
5
Schedule of Milestones
June 1997 June 1998 June 1999 Dec. 1999Dec. 1998Dec. 1997
Design STAPIterative Weight Solver for FPGA
Inter-GPP/DSP Comm.Simulator for STAP
Optimal GPP/DSPConfig. for SAR
GPP/DSP/FPGA Platform Construction and Independent Testing of GPP/DSP and FPGA Subsystems
Implement STAP Iterative Weight Solver on FPGA
Optimal GPP/DSPConfig. for STAP
Implement SAR Linear Filteringon FPGA
Optimal GPP/DSP/FPGAConfig. for SAR/STAP
GPP/DSP and FPGA Subsystem Integration and Testing
Optimal GPP/DSP/FPGA Config. for SAR
Demonstrate Combined SAR/STAP onGPP/DSP/FPGA Platform
Implement SAR on GPP/DSP
Design SAR Linear Filteringfor FPGA
Implement STAP on GPP/DSP
Implement SAR onGPP/DSP/FPGA Platform
Optimal GPP/DSP/FPGA Config. for STAP
Implement STAP onGPP/DSP/FPGA Platform
Develop FPGA Power Consumption Simulator
KeyGPP/DSP Sub-System
Research/DesignImplement/Test
FPGA Sub-SystemResearch/DesignImplement/Test
GPP/DSP/FPGA SystemResearch/DesignImplement/Test
6
OutlineOutline
• Program Objectives and Schedule of Milestones
• Representative Examples of Current Work
• Competing STAP Weight Solvers
• Power Prediction Model for FPGAs
• Optimal Configuration for SAR Processing
• Questions/Answers
7
References for STAP
J. Ward, “Space-Time Adaptive Processing for Airborne Radar,” Technical Report 1015, MIT Lincoln Laboratory, Lexington, MA, 1994.
K. C. Cain, J. A. Torres, and R. T. Williams, (R. A. Games, Project Leader), “RT_STAP: Real-Time Space-Time Adaptive Processing Benchmark,” MITRE Technical Report MTR 96B0000021, Feb. 1997.
MCARM Data Files, Rome Laboratory, (http://sunrise.oc.rl.af.mil).
D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Reading, MA, 1984.
8
Formulation of STAP Weight Equation
mth Ran
ge
Segm
ent
(with
N Rce
lls)L
Cha
nnel
s
Doppler
k (k - 1)(k + 1)
Data Matrix Needed for Calculating Weights for kth Doppler Bin
and mth Range SegmentUsing 3rd Order
Doppler-Factored STAP 131
:),(^
×=× LL
rkx
r
∑=
=N
rkxrkx
mkR
r
H
RN 1),(),(1
),(ψ
9
RR NLNL
mk
3
:),(^
×=×
X
STAP Weight CalculationUsing QR Decomposition
),(),(1
),(),(1),(1
mkmk
Nrkxrkxmk
H
R
r
H
R
N
NR
XX
ψ
=
= ∑=
smkwmk γ=),(),(ψ
The Weight Equation:
sNmkwRR
smkwRRN
rkwRQQRN
RT
T
R
TT
R
γ
γ
=
=
=
),(
),(1
),(1
*11
*
**
QRmkT =),(XQR-Decomposition :
10
Using Conjugate Gradient Approachto Solve the Weight Equation
sw =ψ:Solvingfor CG
Initialization
)()()(
)()1()1()1(
)1()1(
)()()(
)()()()1(
kkTk
kTkkk
kk
kkTk
kTkkk
ddddggd
swg
ddd
dgww
ΨΨ
+−=
−Ψ=
Ψ−=
+++
++
+
)0()0()0()0()0( ,set , Choose dgwsdw −=Ψ−=
Iteration
11
Preliminary Numerical Studies
Relative Error and FLOP count Vs. Tolerance for Nr = 125Data File: re050068 (32 pulses, 28 Weight Vectors Computed)
10-710-8 10-110-6 10-5 10-4 10-3 10-2
10-9
10-8
10-7
10-6
10-5
10-4
10-3
10-2
10-1
100
10-710-8 10-110-6 10-5 10-4 10-3 10-2
108
109
1010
Tolerance Tolerance
Rel
ativ
e Er
ror
FLO
P C
ount
QRCG
12
Preliminary Numerical Studies
Relative Error and FLOP count Vs. Tolerance for Nr = 250Data File: re050068 (32 pulses, 28 Weight Vectors Computed)
10-710-8 10-110-6 10-5 10-4 10-3 10-2 10-710-8 10-110-6 10-5 10-4 10-3 10-2
Tolerance Tolerance
Rel
ativ
e Er
ror
FLO
P C
ount
10-7
10-6
10-5
10-4
10-3
10-2
10-1
108
109
1010
QRCG
13
• Easier and More Efficient to Implement on FPGA Hardware than QR Decomposition Approach:
State Machine Design
Implementation of Conjugate Gradient on FPGAs
NumericalOperations
Registers
ψ
w( j + 1)w( j)
• Floating point• Block floating point• Fixed point
• No. of variables• No. of bits/variable• Dynamic Range• Accuracy
14
QR Decompositionversus
Conjugate Gradient
• QR Decomposition:
• Suitable for GPP/DSP implementation
• Good performance for small values of NR
• Conjugate Gradient:
• Suitable for either GPP/DSP or FPGA implementations
• Good performance for large values of NR
• Provides a way to balance desired precision and computational effort
• FPGA implementations offer many design parameters (e.g., data representation, no. bits variable, etc.)
15
Conceptual Illustration of Trade-Offs(graphs shown are hypothetical)
{Precision, Accuracy, Dyn. Rnge, L, Nr}(Multidimensional Parameter Space)
Com
puta
tiona
l Com
plex
ity
Pow
er R
equi
rem
ents
CG on GPP/DSPQR on GPP/DSP
CG on FPGA - Floating PointCG on FPGA - Block Floating PointCG on FPGA - Fixed Point
{Precision, Accuracy, Dyn. Rnge, L, Nr}(Multidimensional Parameter Space)
16
OutlineOutline
• Program Objectives and Schedule of Milestones
• Representative Examples of Current Work
• Competing STAP Weight Solvers
• Power Prediction Model for FPGAs
• Optimal Configuration for SAR Processing
• Questions/Answers
17
References for FPGA Power Prediction
K. P. Parker and E. J. McCluskey, “Probabilistic Treatment of General Combinatorial Networks,” IEEE Trans. Computers, Vol. C-24, June 1975, pp. 668-670.
Kaushik Roy and Sharat Prasad, “Circuit Activity Based LogicSynthesis for Low Power Reliable Operations,” IEEE Trans. VLSI Systems, Vol. 1, No. 4, Dec.1993, pp.
Kaushik Roy, “Power Dissipation Driven FPGA Place and Route under Timing Constraints,” School of Electrical and Computer Engineering, Purdue University.
“XC4000 Series Field Programmable Gate Arrays,” Xilinx, Inc., September 18, 1996.
18
FPGA Power Consumption
Interconnection fabric
Logic block
Most of the logic/area in the FPGA is used to route signals.
As signals traverse this network of transistors, there can be a significant power consumption.
19
Leakage CurrentDynamic Capacitance Charging Current
Most important for CMOSDependant on clock frequency
Dependant on signal activity
Power Dissipation in CMOS
Transient Current
20
Time-Domain Modeling
x3
x2
x1y
y
x3
x2
x1
:)(1 tx:)(2 tx:)(3 tx
:)(21 txx:)(321 txxx
• Very precise results• Computationally expensive
Calculation of instantaneous power:
p(t)
21
( ) 50.0=clockp
( ) 88.01 =xp
( ) 29.02 =xp
( ) 69.03 =xp ( ) 27.03 =xA
( ) 0.1=clockA
( ) 10.01 =xA
( ) 17.02 =xA
p(s): the probability that signal sattains a logical value of true at any given clock cycle.
A(s): the probability that signal stransitions at any given clock cycle.
Probabilistic Modeling
22
Probabilistic Modeling
x3
x2
x1
y
y
x3
x2
x1
:)(1 tx:)(2 tx:)(3 tx
:)(21 txx:)(321 txxx
• Acceptable results• Computationally inexpensive
p=0.88, A=0.10
p=0.29, A=0.17
p=0.69, A=0.27
p=0.83, A=0.17
p=0.10, A=0.13
Calculation of average power:
∑∈
=gates all
2
21
ggavg ACVP
23
Probabilistic Model Implementation
p(s1), A(s1)
p(s2), A(s2)
p(s3), A(s3)
Step 1: Probabilistic information is distilled from the input data and presented to the model.
Step 2: Probabilistic data “propagates” throughout the model, depositing activity information as it does so.
Step 3: Power is estimated using activity measures and known CMOS gate capacitances.
24
OutlineOutline
• Program Objectives and Schedule of Milestones
• Representative Examples of Current Work
• Competing STAP Weight Solvers
• Power Prediction Model for FPGAs
• Optimal Configuration for SAR Processing
• Questions/Answers
25
J. T. Muehring and J. K. Antonio, “Optimal Configuration of an Embedded Parallel System for Synthetic Aperture Radar Processing,”Proc. Int’l Conf. on Signal Processing Applications & Technology, Boston, MA, Oct. 1996, pp. 1489-1494.(http://hpcl.cs.ttu.edu/~antonio/pubs/conf033.pdf)
T. Einstein, “Realtime Synthetic Aperture Radar Processing on the RACE Multicomputer,” Application Note 203.0, Mercury Computing Systems, Inc., Chelmsford, MA, 1995.
J. C. Curlander and R. N. McDonough, Synthetic Aperture Radar: Systems and Signal Processing, John Wiley & Sons, New York, NY, 1991.
“SHARC DSP Compute Nodes (3.3-Volt),” Mercury Computing Systems, Inc., Chelmsford, MA, 1995.
References for Optimal Configurationfor SAR Processing
26
DistributedCorner-Turn
1
Ran
ge S
ampl
es
Pulse No.
Range Samples
Puls
e N
o.
Range Processing(shown across 3 range processors)
Azimuth Processing(shown across 4 azimuth processors)
1
1
1
K r
Sa
Sa
K r
where Sa is the azimuth section length and Kr is the range reference kernel size
GPP/DSP Approach for SAR Processing
27
Kernel
Discard
OverlapSection
FFT size
Large Overlap/Section ratio ⇒ Small azimuth memory, large number azimuth processorsSmall Overlap/Section ratio ⇒ Large azimuth memory, small number azimuth processors
The Sectioned Convolutionfor the Azimuth Processing
28
Pv F R F F
PvR
F FS
MR v F R F F
MR R S
rr r s r r
a
s aa a
a
rs r r s r r
as a
=+ +
=+
+
=+ +
=+
( lg )
( lg )
( lg )
( )
6 10
6 10
16 6 10
2
2
2
3
2
3
δ α γ δγδ
αγ
δ
δ α γ δγδ
λ δδ
where Pr and Pa are the number of required processors and Mr and Maare the memory requirements in Mbytes for range and azimuth processing, respectively
Derivations for Memory and Processorsfor GPP/DSP Systems
29
• Determine configurations for the CNs, number of CNs of each configuration, and section size, to satisfy processor and memory requirements and minimize power consumption
• Notation and Definitions:– CN Configuration: Specifies the daughtercard type
and number of range and azimuth processors (per configured CN)
– X, Y: The two possible CN configurations– XT, YT: Daughtercard type for each CN configuration
Determining Optimal Configurations for GPP/DSP Systems
30
• Notation and Definitions (continued):– Xr, Yr: Number of range processors per CN (for
each configuration)– Xa, Ya: Number of azimuth processors per CN (for
each configuration)– NX, NY: Number of CNs of configurations X and Y– ΠCN(•): Power per CN as a function of
daughtercard type– MCN(•): Memory per CN as a function of
daughtercard type– PCN(•): Processors per CN as a function of
daughtercard type
Determining Optimal Configurations for GPP/DSP Systems
311,0,,,,,
,....2,1,2
)()(
)()()(
)()()(
)(
)()(
≥≥
=+≥=
≤+≤+
+≥
+≥
+≤+≤
+=
aararYX
aak
a
TCNar
TCNar
aa
aaa
r
rrTCN
aa
aaa
r
rrTCN
aYaXaa
rYrXr
TCNYTCNX
SYYXXNN
kKSF
YPYYXPXX
SPSMY
PMYYM
SPSMX
PMXXM
YNXNSPYNXNP
YΠNXΠNZMinimize:
Subject to:
Optimization Formulation forGPP/DSP Systems
32
Power Consumption in Optimal Configuration
33
% Power Increase of NominalOver Optimal Configuration
340.5 1 1.5 250
100
150
200
250
300
350
400
δ
v
112211112 121112 201112 202112 211121 202130 202130 211202 211211 220
Optimal CN Configurations
arTarT YYYXXX
35
• Assume FPGAs are used for range processing• Additional Notation and Definitions:
– Dr: Dynamic range required for range processing– Ar: Accuracy required for range processing– Ir: Incoming data rate (depends on δ, v, R, Rs)– Tu: Data type used (floating, block, or fixed)– Bu: Number of bits used for data representation
(depends on Dr, Ar, Tu)– Clu: Clock rate used (depends on Ir, Tu, Bu)
Determining Optimal Configurations for GPP/DSP/FPGA Systems
36
• Additional Notation and Definitions (continued):– Gu: Number of FPGA chips used for range
processing (depends on Ir, Tu, Bu)
– ΠG: Power consumption of FPGAs (depends on Clu, Tu, Bu, Gu)
• Ongoing Work– Deriving precise relationships among above terms– Extending current GPP/DSP optimization formulation
to include FPGA utilization
Determining Optimal Configurations for GPP/DSP/FPGA Systems
37
OutlineOutline
• Program Objectives and Schedule of Milestones
• Representative Examples of Current Work
• Competing STAP Weight Solvers
• Power Prediction Model for FPGAs
• Optimal Configuration for SAR Processing
• Questions/Answers