optimal configuration of combined gpp/dsp/fpga systems …antonio/pubs/p-fall97acs.pdfgpp/dsp/fpga...

1

Optimal Configuration of Combined GPP/DSP/FPGA Systems for

Minimal SWAPby

John K. AntonioDepartment of Computer Science

College of EngineeringTexas Tech University

[email protected]

Fall ACS MeetingNovember 4-6, 1997

2

OutlineOutline

• Program Objectives and Schedule of Milestones

• Representative Examples of Current Work

• Competing STAP Weight Solvers

• Power Prediction Model for FPGAs

• Optimal Configuration for SAR Processing

• Questions/Answers

3

Program Objectives

• Demonstrate advantages of combined use of GPP, DSP, and FPGA technologies for SAR and STAP applications

• Demonstrate advantages/disadvantages of different FPGA designs and implementations in terms of power consumption and real-estate requirements

• Develop and evaluate power prediction models for a GPP/DSP/FPGA prototype system

• Development of formal optimizations for configuring GPP/DSP/FPGA systems

4

Program Objectives

• Incorporation of data characteristics and requirements in optimizing system configuration– dynamic range– numerical accuracy

• Incorporation of multiple GPP/DSP algorithms, FPGA designs and implementations, and data representations in optimizing system configuration– Time-domain vs. frequency-domain convolutions– QR vs. conjugate gradient STAP weight solver – Fixed-point vs. block floating point vs. floating point

5

Schedule of Milestones

June 1997 June 1998 June 1999 Dec. 1999Dec. 1998Dec. 1997

Design STAPIterative Weight Solver for FPGA

Inter-GPP/DSP Comm.Simulator for STAP

Optimal GPP/DSPConfig. for SAR

GPP/DSP/FPGA Platform Construction and Independent Testing of GPP/DSP and FPGA Subsystems

Implement STAP Iterative Weight Solver on FPGA

Optimal GPP/DSPConfig. for STAP

Implement SAR Linear Filteringon FPGA

Optimal GPP/DSP/FPGAConfig. for SAR/STAP

GPP/DSP and FPGA Subsystem Integration and Testing

Optimal GPP/DSP/FPGA Config. for SAR

Demonstrate Combined SAR/STAP onGPP/DSP/FPGA Platform

Implement SAR on GPP/DSP

Design SAR Linear Filteringfor FPGA

Implement STAP on GPP/DSP

Implement SAR onGPP/DSP/FPGA Platform

Optimal GPP/DSP/FPGA Config. for STAP

Implement STAP onGPP/DSP/FPGA Platform

Develop FPGA Power Consumption Simulator

KeyGPP/DSP Sub-System

Research/DesignImplement/Test

FPGA Sub-SystemResearch/DesignImplement/Test

GPP/DSP/FPGA SystemResearch/DesignImplement/Test

6

OutlineOutline







7

References for STAP

J. Ward, “Space-Time Adaptive Processing for Airborne Radar,” Technical Report 1015, MIT Lincoln Laboratory, Lexington, MA, 1994.

K. C. Cain, J. A. Torres, and R. T. Williams, (R. A. Games, Project Leader), “RT_STAP: Real-Time Space-Time Adaptive Processing Benchmark,” MITRE Technical Report MTR 96B0000021, Feb. 1997.

MCARM Data Files, Rome Laboratory, (http://sunrise.oc.rl.af.mil).

D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Reading, MA, 1984.

8

Formulation of STAP Weight Equation

mth Ran

ge

Segm

ent

(with

N Rce

lls)L

Cha

nnel

s

Doppler

k (k - 1)(k + 1)

Data Matrix Needed for Calculating Weights for kth Doppler Bin

and mth Range SegmentUsing 3rd Order

Doppler-Factored STAP 131

:),(^

×=× LL

rkx

r

∑=

=N

rkxrkx

mkR

r

H

RN 1),(),(1

),(ψ

9

RR NLNL

mk

3

:),(^

×=×

X

STAP Weight CalculationUsing QR Decomposition

),(),(1

),(),(1),(1

mkmk

Nrkxrkxmk

H

R

r

H

R

N

NR

XX

ψ

=

= ∑=

smkwmk γ=),(),(ψ

The Weight Equation:

sNmkwRR

smkwRRN

rkwRQQRN

RT

T

R

TT

R

γ

γ

=

=

=

),(

),(1

),(1

*11

*

**

QRmkT =),(XQR-Decomposition :

10

Using Conjugate Gradient Approachto Solve the Weight Equation

sw =ψ:Solvingfor CG

Initialization

)()()(

)()1()1()1(

)1()1(

)()()(

)()()()1(

kkTk

kTkkk

kk

kkTk

kTkkk

ddddggd

swg

ddd

dgww

ΨΨ

+−=

−Ψ=

Ψ−=

+++

++

+

)0()0()0()0()0( ,set , Choose dgwsdw −=Ψ−=

Iteration

11

Preliminary Numerical Studies

Relative Error and FLOP count Vs. Tolerance for Nr = 125Data File: re050068 (32 pulses, 28 Weight Vectors Computed)

10-710-8 10-110-6 10-5 10-4 10-3 10-2

10-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

10-710-8 10-110-6 10-5 10-4 10-3 10-2

108

109

1010

Tolerance Tolerance

Rel

ativ

e Er

ror

FLO

P C

ount

QRCG

12

Preliminary Numerical Studies

Relative Error and FLOP count Vs. Tolerance for Nr = 250Data File: re050068 (32 pulses, 28 Weight Vectors Computed)

10-710-8 10-110-6 10-5 10-4 10-3 10-2 10-710-8 10-110-6 10-5 10-4 10-3 10-2

Tolerance Tolerance

Rel

ativ

e Er

ror

FLO

P C

ount

10-7

10-6

10-5

10-4

10-3

10-2

10-1

108

109

1010

QRCG

13

• Easier and More Efficient to Implement on FPGA Hardware than QR Decomposition Approach:

State Machine Design

Implementation of Conjugate Gradient on FPGAs

NumericalOperations

Registers

ψ

w( j + 1)w( j)

• Floating point• Block floating point• Fixed point

• No. of variables• No. of bits/variable• Dynamic Range• Accuracy

14

QR Decompositionversus

Conjugate Gradient

• QR Decomposition:

• Suitable for GPP/DSP implementation

• Good performance for small values of NR

• Conjugate Gradient:

• Suitable for either GPP/DSP or FPGA implementations

• Good performance for large values of NR

• Provides a way to balance desired precision and computational effort

• FPGA implementations offer many design parameters (e.g., data representation, no. bits variable, etc.)

15

Conceptual Illustration of Trade-Offs(graphs shown are hypothetical)

{Precision, Accuracy, Dyn. Rnge, L, Nr}(Multidimensional Parameter Space)

Com

puta

tiona

l Com

plex

ity

Pow

er R

equi

rem

ents

CG on GPP/DSPQR on GPP/DSP

CG on FPGA - Floating PointCG on FPGA - Block Floating PointCG on FPGA - Fixed Point

{Precision, Accuracy, Dyn. Rnge, L, Nr}(Multidimensional Parameter Space)

16

OutlineOutline







17

References for FPGA Power Prediction

K. P. Parker and E. J. McCluskey, “Probabilistic Treatment of General Combinatorial Networks,” IEEE Trans. Computers, Vol. C-24, June 1975, pp. 668-670.

Kaushik Roy and Sharat Prasad, “Circuit Activity Based LogicSynthesis for Low Power Reliable Operations,” IEEE Trans. VLSI Systems, Vol. 1, No. 4, Dec.1993, pp.

Kaushik Roy, “Power Dissipation Driven FPGA Place and Route under Timing Constraints,” School of Electrical and Computer Engineering, Purdue University.

“XC4000 Series Field Programmable Gate Arrays,” Xilinx, Inc., September 18, 1996.

18

FPGA Power Consumption

Interconnection fabric

Logic block

Most of the logic/area in the FPGA is used to route signals.

As signals traverse this network of transistors, there can be a significant power consumption.

19

Leakage CurrentDynamic Capacitance Charging Current

Most important for CMOSDependant on clock frequency

Dependant on signal activity

Power Dissipation in CMOS

Transient Current

20

Time-Domain Modeling

x3

x2

x1y

y

x3

x2

x1

:)(1 tx:)(2 tx:)(3 tx

:)(21 txx:)(321 txxx

• Very precise results• Computationally expensive

Calculation of instantaneous power:

p(t)

21

( ) 50.0=clockp

( ) 88.01 =xp

( ) 29.02 =xp

( ) 69.03 =xp ( ) 27.03 =xA

( ) 0.1=clockA

( ) 10.01 =xA

( ) 17.02 =xA

p(s): the probability that signal sattains a logical value of true at any given clock cycle.

A(s): the probability that signal stransitions at any given clock cycle.

Probabilistic Modeling

22

Probabilistic Modeling

x3

x2

x1

y

y

x3

x2

x1

:)(1 tx:)(2 tx:)(3 tx

:)(21 txx:)(321 txxx

• Acceptable results• Computationally inexpensive

p=0.88, A=0.10

p=0.29, A=0.17

p=0.69, A=0.27

p=0.83, A=0.17

p=0.10, A=0.13

Calculation of average power:

∑∈

=gates all

2

21

ggavg ACVP

23

Probabilistic Model Implementation

p(s1), A(s1)

p(s2), A(s2)

p(s3), A(s3)

Step 1: Probabilistic information is distilled from the input data and presented to the model.

Step 2: Probabilistic data “propagates” throughout the model, depositing activity information as it does so.

Step 3: Power is estimated using activity measures and known CMOS gate capacitances.

24

OutlineOutline







25

J. T. Muehring and J. K. Antonio, “Optimal Configuration of an Embedded Parallel System for Synthetic Aperture Radar Processing,”Proc. Int’l Conf. on Signal Processing Applications & Technology, Boston, MA, Oct. 1996, pp. 1489-1494.(http://hpcl.cs.ttu.edu/~antonio/pubs/conf033.pdf)

T. Einstein, “Realtime Synthetic Aperture Radar Processing on the RACE Multicomputer,” Application Note 203.0, Mercury Computing Systems, Inc., Chelmsford, MA, 1995.

J. C. Curlander and R. N. McDonough, Synthetic Aperture Radar: Systems and Signal Processing, John Wiley & Sons, New York, NY, 1991.

“SHARC DSP Compute Nodes (3.3-Volt),” Mercury Computing Systems, Inc., Chelmsford, MA, 1995.

References for Optimal Configurationfor SAR Processing

26

DistributedCorner-Turn

1

Ran

ge S

ampl

es

Pulse No.

Range Samples

Puls

e N

o.

Range Processing(shown across 3 range processors)

Azimuth Processing(shown across 4 azimuth processors)

1

1

1

K r

Sa

Sa

K r

where Sa is the azimuth section length and Kr is the range reference kernel size

GPP/DSP Approach for SAR Processing

27

Kernel

Discard

OverlapSection

FFT size

Large Overlap/Section ratio ⇒ Small azimuth memory, large number azimuth processorsSmall Overlap/Section ratio ⇒ Large azimuth memory, small number azimuth processors

The Sectioned Convolutionfor the Azimuth Processing

28

Pv F R F F

PvR

F FS

MR v F R F F

MR R S

rr r s r r

a

s aa a

a

rs r r s r r

as a

=+ +

=+

+

=+ +

=+

( lg )

( lg )

( lg )

( )

6 10

6 10

16 6 10

2

2

2

3

2

3

δ α γ δγδ

αγ

δ

δ α γ δγδ

λ δδ

where Pr and Pa are the number of required processors and Mr and Maare the memory requirements in Mbytes for range and azimuth processing, respectively

Derivations for Memory and Processorsfor GPP/DSP Systems

29

• Determine configurations for the CNs, number of CNs of each configuration, and section size, to satisfy processor and memory requirements and minimize power consumption

• Notation and Definitions:– CN Configuration: Specifies the daughtercard type

and number of range and azimuth processors (per configured CN)

– X, Y: The two possible CN configurations– XT, YT: Daughtercard type for each CN configuration

Determining Optimal Configurations for GPP/DSP Systems

30

• Notation and Definitions (continued):– Xr, Yr: Number of range processors per CN (for

each configuration)– Xa, Ya: Number of azimuth processors per CN (for

each configuration)– NX, NY: Number of CNs of configurations X and Y– ΠCN(•): Power per CN as a function of

daughtercard type– MCN(•): Memory per CN as a function of

daughtercard type– PCN(•): Processors per CN as a function of

daughtercard type

Determining Optimal Configurations for GPP/DSP Systems

311,0,,,,,

,....2,1,2

)()(

)()()(

)()()(

)(

)()(

≥≥

=+≥=

≤+≤+

+≥

+≥

+≤+≤

+=

aararYX

aak

a

TCNar

TCNar

aa

aaa

r

rrTCN

aa

aaa

r

rrTCN

aYaXaa

rYrXr

TCNYTCNX

SYYXXNN

kKSF

YPYYXPXX

SPSMY

PMYYM

SPSMX

PMXXM

YNXNSPYNXNP

YΠNXΠNZMinimize:

Subject to:

Optimization Formulation forGPP/DSP Systems

32

Power Consumption in Optimal Configuration

33

% Power Increase of NominalOver Optimal Configuration

340.5 1 1.5 250

100

150

200

250

300

350

400

δ

v

112211112 121112 201112 202112 211121 202130 202130 211202 211211 220

Optimal CN Configurations

arTarT YYYXXX

35

• Assume FPGAs are used for range processing• Additional Notation and Definitions:

– Dr: Dynamic range required for range processing– Ar: Accuracy required for range processing– Ir: Incoming data rate (depends on δ, v, R, Rs)– Tu: Data type used (floating, block, or fixed)– Bu: Number of bits used for data representation

(depends on Dr, Ar, Tu)– Clu: Clock rate used (depends on Ir, Tu, Bu)

Determining Optimal Configurations for GPP/DSP/FPGA Systems

36

• Additional Notation and Definitions (continued):– Gu: Number of FPGA chips used for range

processing (depends on Ir, Tu, Bu)

– ΠG: Power consumption of FPGAs (depends on Clu, Tu, Bu, Gu)

• Ongoing Work– Deriving precise relationships among above terms– Extending current GPP/DSP optimization formulation

to include FPGA utilization

Determining Optimal Configurations for GPP/DSP/FPGA Systems

37

OutlineOutline







optimal configuration of combined gpp/dsp/fpga systems …antonio/pubs/p-fall97acs.pdfgpp/dsp/fpga...

Documents