a remote memory access infrastructure for global address ...willenbe/publications/fpga... · global...

21
High-Performance Reconfigurable Computing Group University of Toronto A Remote Memory Access Infrastructure for Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February 13, 2013

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

High-Performance Reconfigurable Computing Group

University of Toronto

A Remote Memory Access Infrastructure for

Global Address Space Programming Models

in FPGAs

Ruediger Willenberg and Paul Chow

February 13, 2013

Page 2: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Parallelizing Computation

• How to partition and communicate data

• How to synchronize computation and data access

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

2

Page 3: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Parallel Programming Models

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

3

Page 4: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Partitioned Global Address Space

• Shared Memory access, but…

• …visible difference between local and remote data

• PGAS allows one-sided communication

• Many PGAS implementations

– Modified standard languages (Unified Parallel C)

– New parallel languages (Chapel)

– Application libraries (Global Arrays)

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

4

Page 5: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Our programming model philosophy

• Use a common API for Software and Hardware

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

5 F

PG

A

Host CPU (x86)

Application

Common API

Network

Drivers

Network

Kernel migration

FPG

A

Host CPU (x86)

Application

Common API

Embedded CPU

Application

Common API

Common API

Hardware

Network

Drivers

Network

Kernel migration

FPG

A

Host CPU (x86)

Application

Common API

Embedded CPU

Application

Common API

Common API

Hardware

Custom

Processing

Element

Common API

Hardware

Network

Drivers

Network

Kernel migration

FPG

A

Host CPU (x86)

Application

Common API

Embedded CPU

Application

Common API

Common API

Hardware

Custom

Processing

Element

Common API

Hardware

Network

Drivers

Network

Kernel migration

Page 6: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Common SW/HW API

• CPU and FPGA components can initiate data

transfers

• SW and HW components use similar call formats

• For distributed memory and message-passing,

this was implemented by TMD-MPI

• Our contribution:

Building hardware infrastructure for a

common API for PGAS

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

6

Page 7: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Rationale

• Why a common API ?

– Easier development: SW Prototyping Migration

– Model makes no distinction between CPUs and FPGAs

– FPGA-initiated communication benefits performance

• Why PGAS instead of Message-passing ?

– Higher programming productivity, easier maintenance

– Performance benefits through one-sided communication

– Shared memory more suitable for certain applications

• Graph-based algorithms, pointer chasing

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

7

Page 8: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GASNet instead of MPI

• Global Address Space Networking

• Software communication library for (P)GAS

• Developed as a communication layer for UPC,

Co-Array Fortran and Titanium (Java)

• GASNet’s core API is built on the concept of

“Active Messages”

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

8

Page 9: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GASNet Active Messages

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

9

Page 10: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GASNet on FPGA: GAScore

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

10

BlockRAM

GAScore

GA

SNet.h

App.c

CPU

FPGA

On-ChipNetwork

Page 11: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GAScore

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

11

BlockRAM

Token

Buffer

Memory

Interface

Transmit

Handlercall

AMrequest

To

Network

From

Network

GAScore

Receive

CPU

Page 12: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GAScore

• Remote memory communication engine (RDMA)

• Controlled through FIFOs (FSLs)

• Configuration parameters are the same as for

GASNet Active Message function calls

– Simple software layer for embedded CPUs

• Independent receive/transmit channels

– No deadlocks

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

12

Page 13: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

GASNet on FPGA: GAScore

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

13

BlockRAM

GAScore

GA

SNet.h

App.c

CPU

FPGA

On-ChipNetwork

BlockRAM

GAScore

GA

SNet.h

App.c

CPU

FPGA

On-ChipNetwork

Custom

Hardware

BlockRAM

GAScore

GA

SNet.h

App.c

CPU

FPGA

On-ChipNetwork

Custom

Hardware

PAMS

Custom

Core

Page 14: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Programmable Active Message Sequencer

• Controls Custom Hardware core

• Handles reception/transmission of Active Messages

• Custom core operations and Active Messages are

initiated based on:

– Custom Hardware state

– Number of received messages with a specific code

– Amount of received data

– Timer

• (Re-)programmable through Active Messages

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

14

Page 15: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

On-chip GASNet system

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

15

BlockRAM

Gc

BlockRAM

CH GcM

em

ory

Contr

olle

r CH

Gc

CPU

GcOCCC

OC

CCOn-Chip

NetworkDR

AM Host

CPUvia PCIeEthernetQPI

Other FPGAs viaPCIe / Ethernet / Infiniband

FPGA

CPU

Page 16: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

CH CH

CH

CH

CH

CH

CH CH CH

CH

CH

CH

OCCC

OC

CC

OC

CC

OC

CC

OCCC OCCC

BlockRAM

µB Gc

BlockRAM

CH Gc

BlockRAM

CH Gc

BlockRAM

CH GcOCCC

OC

CC

Network

Net Net

Net

BlockRAM

CH Gc

BEE3 multi-FPGA platform

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

16

fOp=100MHz

Page 17: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Active Message latencies

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

17

FPGA hops 1-way 2-way

0 17 35

1 24 49

2 31 63

FPGA hops 1-way

(remote write)

2-way

(remote read)

0 29 47

1 36 61

2 43 75

1-word memory transfer latency (cycles)

Short message latency (cycles)

Page 18: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Barrier latencies

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

18

Latency to node 0 87

Latency to node 0 and back 196

Latency to node 0 62

Latency to node 0 and back 148

Simple barrier latency (cycles)

“Staggered” barrier latency (cycles)

Page 19: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Future Work

• Hardware components:

– External memory support

– Strided/vectored memory accesses

– Host interface (PCIe, QPI, HyperTransport)

– Latency/bandwidth optimizations

• Software components:

– GASNet integration on CPU side

– Application benchmarks

– PGAS C++ library for heterogeneous systems

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

19

Page 20: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Heterogeneous C++ PGAS library

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

20

CPU-based

Host

CPU-based

Host GASNet

CPU-based

Host

GASNet Library

Heterogeneous

C++ PGAS Library

C++ PGAS Application C++ generated code

DSL application

Compile Static

generation

Dynamic generation

P A M S

Custom

FPGA

Hardware

Manual

or

HLS

Page 21: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February

Thank you for your attention!

Questions ?

February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto

21