gtc 2017 green flash - nvidia · 2017. 5. 16. · gtc 2017 rtc real time controller legacy...

28
Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014 GTC 2017 Green Flash Persistent Kernel : Real-Time, Low-Latency and High- Performance Computation on Pascal Julien BERNARD

Upload: others

Post on 25-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

GTC 2017Green Flash

Persistent Kernel : Real-Time, Low-Latency and High-Performance Computation on Pascal

Julien BERNARD

Page 2: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

Green Flash

● Public and private actors– Paris Observatory– University of Durham– Microgate– PLDA

● Part of Horizon 2020 : EU Researchand Innovation programme

● 3 years project● 3,8 million €● Involve about 30 people

● Research axes– Real time HPC with accelerators and smart interconnects– Energy efficient platform based on FPGA– Real Time Controller (RTC) prototype for European – Extremely Large Telescope Adaptive

Optics (AO) system

Page 3: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Contributors

Maxime Lainé : software engineer

Denis Perret : FPGA expert

Arnaud Sevin : software lead

Damien Gratadour : project lead

Christophe Rouaud : PLDA project lead

Gaetan Dufourcq : QuickPlay expert

Page 4: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

E-ELT :Adaptive Optics

● Compensate in real-time the wavefront perturbations

● Using a wavefront sensor - WFS to measure them

● Using a deformable mirror – DM to reshape the wavefront

● Commands to the mirror must be computed in real-time (~ms rate)

Page 5: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

RTC concept for ELT AO

Page 6: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

RTC concept for ELT AO

Page 7: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

RTC

Real Time controller

Legacy architecture● IE. SPARTA architecture

– DSP & CPU

– VXS backplane

Instrument WFS meas. DM com. Freq (Hz)

Performance (GMAC/s)

Sphere 1 2.6K 1 1.3k 1.5k 5.2

AOF 4 2.4k 1 1.2k 1k 11.8

Active elements

Sensor

Switch

Page 8: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

RTCNode 0

RTCNode N-1

RTCNode ...

Real Time controller

Cluster network architecture

Instrument WFS meas. DM com. Freq (Hz)

Performance (GMAC/s)

Sphere 1 2.6K 1 1.3k 1.5k 5.2

AOF 4 2.4k 1 1.2k 1k 11.8

ELT 6 80k 3 15k 500 1.2k

Sensor 0

Active elements 0

Active elements 1

Sensor 2

Sensor 3

Sensor 4

Sensor 5

Sensor 1

Active elements 2

Switch

Page 9: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Legacy GPU programming

GPUGPU RAM

CPUCPU RAM

PCIe

10GbE NIC

main { setup(); while(run){ recv(…); cudaMemcpy(…, HostToDevice); computing_kernel<<<>>>(…); cudaMemcpy(…, DeviceToHost); send(…); }}

Page 10: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Legacy GPU programming

cudaMemcopy() overhead times (5.12Mo in, 64Ko out)

Kernel launches overhead times

Both cases : jitter of 20 to 30 µsec (40 µsec sometimes)

Page 11: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Legacy GPU programming

Leaves not enough time for computations

Page 12: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Improvement

GPU direct & I/O Memory mapping

Persistent Kernel

Page 13: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

GPU direct & I/O Memory mapping

Page 14: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

GPU direct & I/O Memory mapping

● FPGA writes/reads directly to/from GPU memory● CPU free for other kind of computations

FPGA NIC

Host ram

CPU app

Camera controlFPGA controlMeas. Comp.

GPU ram GPU

Camera protocol handler

DMADMC protocol

handler

DMA

UDP Offload Engine Pixels

buffer

DM com buffer

DMA

start

PCI-e 3.0

DMAanswers

Latency measurement

DMAmeasures

Pixels buffer

computekernels

Page 15: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

FPGA Development platform

Eased devel. Process using the QuickPlay tool from PLDA

Page 16: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

FPGA Development platform

● Single generic design / multiple target boards– ExpressK-US board

(hosting a Kintex UltraScale from Xilinx)

– ExpressGX V board (hosting a Stratix V from Altera)

– μXlink board from microgate (hosting a Arria 10 board from Altera)

Page 17: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Persistent Kernel

Page 18: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Classic implementation

Page 19: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Persistent kernel implementation

Page 20: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

GPU direct, I/O Memory mapping & Persistent kernel

GPUGPU RAM

CPUCPU RAM

PCIe

10GbE FPGA NIC

start

main { setup(); persistent_kernel <<<>>>(…); …}

persistent_kernel(…){ while(run){ pollMemory(…); computation(...); startDMATransfer(…); }}

Page 21: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Pipelining I/O and compute

FPGA PLDA XPressG5GPU Tesla C2070OS Debian wheezy

Camera EVT HS-2000M10GbE network

µsec

iterations

No GPUDirect

GPUDirect + persistent kernel

SCAO Pyramid case: 240 x 240 pixels, encoded on 16b

Page 22: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Pipelining I/O and compute

Page 23: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

DGX-1 benchmark

● FPGA is replace by CPU

● Each node master receive frame data

● Work is shared between all devices

● RTC master send back final resut

Node mastersRTC Master

Slaves

Page 24: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

Result 1/2 : Time and jitterHistogram4 devices case with10,048 slopes x 15,000 commands

Average : 0.45ms

Jitter peak to peak : 17µs

Variation : 1.8 %

Time in ms

Page 25: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

Result 2/2 : Sync & Intercom time

Intercommunication time Synchronize time

Average : 15µs Jitter : 8.8µsAverage : 24µs Jitter : 12µs

Page 26: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

Conclusion & future work

● Conclusion– Using GPUDirect and a

persistent kernel allow efficient data delivery to the RTC

– Lower jitter

– Simpler execution stream

– QuickPlay tool from PLDA● Eased FPGA development cycle● Mix communication protocols and

data processing into the same streams

● Expandable ecosystem, with QuickStore / QuickAliance

● Future– Test on AO bench (with DM

and WFS)

– Use multi nodes architecture

– Test with fp16

Page 27: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

Project #671662 funded by European Commission under program H2020-EU.1.2.2 coordinated in H2020-FETHPC-2014

Thank you

Question ?

Page 28: GTC 2017 Green Flash - NVIDIA · 2017. 5. 16. · GTC 2017 RTC Real Time controller Legacy architecture IE. SPARTA architecture – DSP & CPU – VXS backplane Instrument WFS meas

GTC 2017

● DGX-1 benchmark● Result 1/2 : Time and jitter● Result 2/2 : Sync & Intercom time● Conclusion & future work● Thank you● RTC AO prototype for E-ELT● Test pipeline● Time measurement strategies● Conclusion : Persistent kernel● future work● New features● Test architecture