difx performance testing chris phillips evlbi project scientist 25 june 2009

24
DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Upload: evelyn-campbell

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

DiFX Performance Testing

Chris Phillips

eVLBI Project Scientist

25 June 2009

Page 2: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

DiFX history

• Developed by Adam Deller at Swinburne University of Technology (now NRAO) to replace LBA S2 correlator to allow disk based correlation

• Production correlator of the LBA (Australia) since 2007

• Verified against LBA, VLBA and Bonn hardware correlators

Page 3: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

DiFX overview

• FX-style correlator implemented in C++ • 95% optimised C vector function call

(Heavy reliance of Intel IPP libraries)

• Non-clocked system, unlike HWCs• Maximum performance without compromising generality or ease of maintenance

• Modular design to support generality and enable “3rd party” contributors and local system optimisation

Page 4: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Capabilities

• Near-arbitrary time and frequency resolution• Advanced pulsar gating• eVLBI (LBA has done 1 Gbps eVLBI)• Correlate anything it can unpack (1/2/4/X Gbps)

• Most new formats easy to implement

Page 5: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Supported formats

• Input• LBA• Mk5A (Mk4/VLBA)• K5 (via translation)• Mk5B• VDIF(end 2009)

• Output• RPFITS, FITS-IDI

Page 6: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Current users

• Long Baseline Array (Australia)• VLBA (USA)• MPIfR (Bonn, Germany)• AuScope geodetic array (Australia/NZ, 2009)• E-LOFAR (EU)

Page 7: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Future/Imminent Capabilities

• Single pass, multiple phase center's• Improved (faster) fringe rotation• Band matching

• eg 2x64MHz with 1x128MHz

• Baseband pulsar "folder"• Native geodetic output format• Phase cal extraction• Frequency division multiplexing of VDIF• Polyphase filterbank

Page 8: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

DiFX architecture

Master Node

Core 1DataStream 1

DataStream 2

DataStream N

Core 2

Core M

… …

Timerange, destination

Baseband data

Visibilities

Source dataSource data

MPI is used for inter-process communications

Each data transfer is double buffered

Large, segmented ring buffer

Up to 100s MB/a few or more seconds Visbility buffer

Visbility buffer

Visbility buffer

processing buffer

processing buffer

processing buffer

Page 9: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Computational Distribution

• Currently: only time division multiplexing• VDIF will allow frequency division multiplexing: implementation style?

• As currently implemented all baselines must still be correlated on one Core

Page 10: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Benchmarking

• Need to eliminate disk i/o go get clear indication of potential speed of specific setup

• eVLBI!• Live eVLBI not suitable as fixed data rate

• VLBIFAKE program generates eVLBI data stream

• LBADR, Mark5B and VDIF• TCP and UDP• Only TCP usable for benchmarking

• Shell script to run correlator and save logs• Rate determined by median transfer from VLBIFAKE

CSIRO. eVLBI-Aus

Page 11: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Cuppa

• 20 nodes, dual CPU Quad core• 6 stations• Up to 12 processing nodes• Testing number of threads and processing cores

CSIRO. eVLBI-Aus

Page 12: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Scaling with Cores

Page 13: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Date Rate Per Compute Node

Page 14: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Scaling with Threads

Page 15: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Scaling with Threads

Page 16: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Scaling with Spectral Points

Page 17: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Scaling with Stations

Page 18: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

Page 19: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

APSR

• 18 compute nodes, dual CPU Quad core• 5 i/o nodes dual CPU dual core• 4 stations• Up to 18 processing nodes

CSIRO. eVLBI-Aus

Page 20: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009
Page 21: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Date Rate Per Compute Node

Page 22: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Code collaboration status

• Entire codebase has been organised on SVN (hosted by ATNF)

• DiFX wiki (hosted by Curtin): http://cira.ivec.org/dokuwiki/doku.php/difx/index

• Mailing list: [email protected]• To get on the difx-users list, search out difx-users on google groups and request access, or email me

Page 23: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Contact UsPhone: 1300 363 400 or +61 3 9545 2176

Email: [email protected] Web: www.csiro.au

Thank you

ATNFChris PhillipseVLBI Project Scientist

Phone: +61 2 93724608Email: [email protected]: www.atnf.csiro.au/vlbi

Page 24: DiFX Performance Testing Chris Phillips eVLBI Project Scientist 25 June 2009

Benchmarks

• Non-clocked system, unlike HWCs• Indicative number of CPU cores required to correlate at real time:

• LBA @ 1 Gbps (256 MHz agg. b/w, 2 bit): 100• VLBA @ 4 Gbps (1 GHz agg. b/w, 2 bit): 800

• Weak dependencies on e.g. num. channels• 160 CPU core system (exceeding VLBA HWC capacity) costs <$100k inc. networking, annual electricity ~$10k