rapid prototyping of radar signal processing systems using...

14
D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p1 Rapid Prototyping of RADAR Signal Processing Systems using Ptolemy Classic Ptolemy MiniConference UCB Denis Aulagnier, Patrick Meyer, Hans Schurer , Xavier Warzee, THALES D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p2 CONTENTS * The ESPADON programme & methodology Environment & development process used for the benchmark * ESPADON Ptolemy developments & the benchmark Benchmarking application for ESPADON Ptolemy developments Library set-up and features Improvements done after first use MERCURY target development Benchmark iterations and results * Conclusions

Upload: others

Post on 17-Jun-2020

26 views

Category:

Documents


4 download

TRANSCRIPT

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p1

Rapid Prototypingof

RADAR Signal Processing Systemsusing

Ptolemy Classic

Ptolemy MiniConference UCBDenis Aulagnier, Patrick Meyer, Hans Schurer, Xavier Warzee,

THALES

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p2

CONTENTS

* The ESPADON programme & methodology– Environment & development process used for the benchmark

* ESPADON Ptolemy developments & the benchmark– Benchmarking application for ESPADON– Ptolemy developments

• Library set-up and features– Improvements done after first use

• MERCURY target development– Benchmark iterations and results

* Conclusions

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p3

ESPADON & INDUSTRIAL PARTNERS

* ESPADON: Environment for Signal Processing Application Developmentand PrOtotypiNg

* EUROFINDER PROGRAMME in France, UK, Netherlands:– FRANCE

• THALES (Former THOMSON-CSF)– THALES AIRBORNE SYSTEMS,– THALES COMMUNICATION,– THALES OPTRONIC,– THALES AIR DEFENCE SYSTEMS

• THOMSON MARCONI SONAR SAS• MATRA BAe Dynamics

– UNITED KINGDOM• BAE SYSTEMS Advanced Technology Centres• THOMSON MARCONI SONAR Ltd

– NETHERLANDS• THALES Naval Netherlands (former: THOMSON-CSF SIGNAAL)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p4

Plan SP Development

Risk

Register

Requirements

Development

Plan

From System Development

Functional Design

Architectural Design

Specification

Implementation

To System Development

System Review

THE ESPADON METHODOLOGY

• Risk driven development life cycle• Model Year approach• Reuse and capitalisation• Support for:

­ Traceability­ Cost performance trade off

Requirements

Risk Register

Risk Analysis Definition Development Validation

Development Plan

From Previous Process

To Next/PreviousProcess

Review

Phase 1:Analysis and Selection of the requirementsallocated to SP Subsystem

GO/NO GO

Phase 2:Definition of SP Subsystem

Phase 4:Validation of SPSubsystem

Phase 3:Developmentof SP Subsystem

Example of risk:Real time performance

Example of risk:SP algorithms, ...

SP Functional definition

Computerarchitecture

choice

Simulation

Example of risk:Computing power

Functionalmodelling

Example of risk:Software development

SP production

Refinement ofarchitecture choice

Mappingdescription

Hardware/Softwaredescription

Placementof functions

Development ofperformance model

Software/Hardwaredevelopment(synthesis)

ProductionIntegration

Choicevalidation

Validationof performance model

Validationof virtual prototype

Validation ofmanufactured computer

INCREASING LEVEL OF REFINEMENT

Spiral Model Representation

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p5

MatlabSimulink/RTW

ESPADON DESIGN ENVIRONMENT (EDE)

PTOLEMY (or GEDAE)

• AlgorithmPrototyping

Target/Porting Kit

VSIP ICS

Rapid prototyping machine Mercury G4/RACE++

• Libraries• Standards

•Tools

ED

E F

ram

ewor

k

HANDEL-C

FPGA boardTarget H/W

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p6

Reusable Components Functional simulation SDF (BDF, FSM)

F1

F3

F2

F4

F5Functional designlibrary:

• SDF VSIP Stars

PTOLEMY DEVELOPMENT PROCESS

Implementation: CGCC code generation for Target

Run on the Target Target B

Code generationlibraries:

•CGC VSIP Stars•Target optimisedlibrary•Communicationlibrary

Target C

C code for the target--------------------------------------------------

Implementation: CGC with an “Handel-C” syntax

Code generationlibraries:

• CGC VSIP Stars• VHDL library• VHDL Drivers forCommunication

HANDEL-CC to VHDL/EDIF conversion

Architecture Design : CGTarget selection / Partitioning / Performance analysis

F1

F3

F2

F4

F5

PE1PE1PE2PE2

PE3PE3receive SendSEND/RECEIVE

Performance analysis:

• CGC VSIP Stars withperformance info

F2

FPGAFPGAreceive Send

STIMULI

STIMULIGENERATOR

MATLAB

POST-PROCDISPLAY

MATLAB

FUNCTIONALRESULTS

PTOLEMY

Gantt Chart Display

F4 F5

F2

F1 F3PE 1

PE 2

PE 3

PROTOTYPERESULTS

POST-PROCDISPLAY

COMPARASON

MATLAB

FUNCTIONALRESULTS

STIMULI

TATL

Real time trace Display

F4 F5

F2

F1 F3PE 1

PE 2

PE 3

PROTOTYPERESULTS

POST-PROCDISPLAY

COMPARASON

MATLAB

FUNCTIONALRESULTS

STIMULI

TATL

Real time trace Display

F4 F5

F2

F1 F3PE 1

PE 2

PE 3

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p7

BEAMFORMER APPLICATION

* From a vertical array, e.g. 8 antenna channels, to 6 beams* High level set-up of the radar beamformer application:

* Waveform: 16 pulses, PRF=3-6 kHz, Fsample=2.5MHz* Input: 8 IQ-channels 32 bits complex float: 160 MB/sec* Output: 6 beams 32 bits complex float: 120 MB/sec

Window:Stabilization,

Tapering,Calibration...

Channel 1

Channel 8 Beam 6

Beam 1

N-pointFFT/FIR Elevation beams output towards

velocity filtering & detection

Rec

eive

ant

enna

s

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p8

BEAMFORMER CALIBRATION

* Normal burst pattern is one clutter sweep + 16 air pulses* Calibration is performed instead of clutter measurement

using 48 pulses (mode switch):

Burst k

Clutter Measurement s=1

Air-Burst PulsesClutter Pulse

...........

T 3t=0

1 2 3 4 16

s=3s=2 s=16s=4

T 4T2 T 16

0

s= 0

T1

Calibration PulsesTest Pulse

1 2 3 4 5 6 7 8 9 10

First incoherent integrationsum of 4 RQs

Second incoherentintegration sum of 4 RQs

etc.

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p9

BEAMFORMER DESIGN

BEAMFORMER MONITOR & CONTROL

SIGNALWEIGHTING

(CMPLX MUL)

BEAMFORMING

(FFT)

Inputinterface

tothe Filesystem

Outputinterface

tothe Filesystem

NOISEMEASUREMENT

CALCULATEWEIGHTING

FUNCTION

INCOHERENTINTEGRATION

CALCULATEPHASE

DIFFERENCE

COHERENTINTEGRATION

CALCULATEGAIN

DIFFERENCE

BEAM CALCULATION

CALIBRATION

I/Q VIDEO

level_stab_shift, CVE_phase, RF, receiver_STC

wei

ghtin

g_c

ontr

ol

BF_control BF_status

BEAMFORMER FUNCTIONAL DIAGRAM

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p10

ESPADON PTOLEMY

* Within Ptolemy we only use:– SDF (or BDF) Domain for functional simulation– CGC Domain for Code Generation (and implementation)

* What we have developed for the benchmark is:– An extension of the Library of stars (both in SDF/BDF and CGC

available, total: 70)• Radar Library (5 components)• VSIP Core Light Library (partially, 11 components)• Support Library (e.g. components for parallel operation, 19 components)

– Target for the MERCURY Machine (G2 and G4 processor)• VSIP vectors are allocated in one buffer (per processor)• Synchronized Inter-Processor Communication for Complex Vector (The

Burst Message is always sent along with the data)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p11

PTOLEMY LIBRARY

* Use of VSIPL standard library– Pass pointers of VSIPL views between stars instead of data (‘int’-type)

* Develop multi- and complex-interleave star needed forcorner-turn process (in HOF domain)

* Extent CGC-BDF to handle multiprocessor architecture* Important requirements to developed elements:

– Keep library platform independent, dependency is only in the target– Make control flow explicit in the data-flow graphs

* Stars with vector output are provided with 2 extraparameters:– MAX_BUF_LENGTH: Maximum length of a vector– OUT_BUF_OPT: Number of output buffers used for each vector

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p12

PTOLEMY LIBRARY FEATURES

* Support in-place operation (if possible)* Support rate change i.e. the output buffer is automatically

duplicated as many times as needed* Colours of the stars highlight the different kind of stars

used in the design:– Standard Ptolemy stars (WHITE) that use only std C library,– VSIPL stars (GREEN) that use the std C library and the VSIPL Core

Light library,– Application specific stars (RED) that also use MERCURY library

(ICS) and/or are specific to the ESPADON radar benchmark.

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p13

LIBRARY SET-UP (CGC)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p14

PTOLEMY LIBRARY IMPROVEMENT

* All the stars allocate the required buffers in the “GlobalBuffer” during the setup phase:

GL

OB

AL

BU

FF

ER

(SM

AB

)

CO

MM

UN

ICA

TIO

N C

HA

NN

EL

SL

IBR

AR

Y S

TA

RS

FREE SPACE

FIRST STAR

SECOND STAR

LAST STAR

DA

TA

& B

UR

ST M

ESS

AG

E

SYNCHRONISATION FLAGS

SLOT 1

SLOT 2

SLOT 4

SLOT 3CHANNEL 1

CHANNEL 2

CHANNEL N

CHANNEL 1

CHANNEL 2

CHANNEL N

ON

E S

LO

T

SIG

NA

L D

AT

APA

RA

ME

TE

RS

GR

OW

ING

OF

FSE

TS

OFFSET 0

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p15

PTOLEMY MERCURY TARGET (1)

* Features– Generate a C-file for each processor, compile, load and run the

application on the machine– Use MERCURY ICS Library and VSIPL (exclusively)

⇒ Make it portable to any MERCURY machine– Arrange synchronisation and data transfer between PPCs– Data transfer uses DMA ⇒ efficient

• Synchronisation protocol uses simple flags• Support Variable Vector Length: each communication buffer is duplicated

N times (user defined) and the effective transfer length is set in real time• Memory is allocated for the maximum vector length (user defined)• Support both complex storage types (interleaved & split)• Support complex float vectors (only)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p16

PTOLEMY MERCURY TARGET (2)

* Features (continued)– Implement TATL Trace Tool from MERCURY– Overview of the main parameters (to be set by the user):

• Number of processors• CE id for each processor• Size of the Shared Memory Buffer (SMB) for each processor (only one

SMB is created in each processor)• Size of the “heap” is set for all processors• Communication buffer length (only one parameter for all the

communication channels)• ON/OFF switches for debug messages and TATL (trace for all stars

possible)• Give any ‘runmc’ command line option

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p17

PTOLEMY MERCURY TARGET (3)

* Interface with VSIPL issues– If the input vector is already allocated inside the SMB and the stride of

the vector view is equal to one, then the copy is not needed.⇒ efficient transfer is possible (using In-Place operation)

(Vector view with a stride > 1 are not supported. A 2D DMA is required).– But according to VSIPL policy, any VSIPL function is allowed to move

the data to the more appropriate place (e.g. to internal memory for aDSP). Therefore the copy is always needed if we use the ‘VSIPL data’space.

– This problem is solved if we use only ‘User data’ space. In doing thiswe do not follow the defined VSIPL standard, however!

⇒ VSIPL does not fit well on a multi-processor machine like theMERCURY machine (interface VSIPL - ICS not efficient).

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p18

ESPADON PTOLEMY ISSUES (1)

* Future work to solve known problems:– The same buffer size is applied to all communication channels

⇒ Memory allocation overhead– The Burst Message structure is hard-coded

⇒ Application dependent stars are used in the design– The BDF stars are available only for galaxies with single input & single

output, and multi-rate is not supported⇒ Strong design constraint

– The BDF stars can only be used inside a processor⇒ Design constraint

– The CGC library elements are not calibrated in terms of execution time⇒ Automatic mapping may fail

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p19

ESPADON PTOLEMY ISSUES (2)

* Future work to solve known problems (continued):– The Memory boards are implemented inside the I/O stars

⇒ Memory boards are not really integrated in the design environment– The inter-processor communication functions support only VSIPL

complex float vectors⇒ Design constraint

– The TATL Tool cannot be used if the design counts more than 384different stars (due to the limited number of event types)

⇒ Design constraint

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p20

ITERATION 1: BARE BEAMFORMER DESIGN

* Iteration 1 (6 processor design):

Data in+

Distribute

2 striplines

2 striplines

2 striplines

2 striplines

NofRQ/NofProcs

NofRQ/NofProcs

NofRQ/NofProcs

NofRQ/NofProcs

Collect+

Data out

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p21

TATL RESULTS FOR ITERATION 4

* Bare beamformer on 8 processors

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p22

Iteration 1 Iteration 2 Iteration 3 Iteration 4NofChannel 8 8 8 8NofSweep 17 17 17 17NofProc 4 (+2) 5 (+2) 4+4 8Possible NofProc 1, 2, 4, 8 (+2) 2, 3, 5, 9, 17 (+2) >1 1, 2, 4, 8Input data DMA DMA PRE-LOADED PRE-LOADEDOutput data (DMA) (DMA) (DMA) - (Pb MCS)CORNER-TURN 4->4 NO 4->4 8->8RACE++ peak load ? ? ? 53 %LATENCY 1 burst 1 burst 2 bursts 1 burstPERFORMANCE* 25 ms 21 ms 9.5 ms 9 msSupport Var. Burst L. YES YES YES YESDesign Time# 72 H 16 H 12 H 16 H* The performance is the average processing time for one burst. The measurement has been done withTATL on 10 bursts of 19000 RQ of 400 ns (i.e. 7.6 ms).# Time is without extensive functional testing.

BENCHMARK PERFORMANCE METRICS

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p23

BENCHMARK FINAL DESIGN (1)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p24

BENCHMARK FINAL DESIGN (2)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p25

FINAL BURST TIMING RESULTS

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p26

CONCLUSIONS (1)

* Main functional requirements are met by the final design(12 of the 19 requirements)

* Throughput and latency requirements are almost met;expected to be met in case of full speed G4 daughtercards and/or VSIPL functions redesign

* Review of graphical Ptolemy designs seems faster andmore efficient than code reviews– Disadvantage is parameter handling and scope.– Design is highly multi-rate, but this is difficult to see– Some functionality is inside stars (hidden)

* Total design, validate & test time for bare beamformerwas 354.5 hours, while normal development takes 481hours: Approximately 36% faster (improvement ~1.36)

D. Aulagnier/P. Meyer/H.Schurer/X.Warzee, p27

CONCLUSIONS (2)

* Development time from functional/architectural design toimplementation is very short: matter of days

* For which purpose can we use it?– Mainly for rapid prototyping of new algorithms– Rapid prototyping of demonstrators– Open source approach enables us to adapt the tool to our needs

* Many improvements are needed before it can be used fora complete application/project