![Page 1: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/1.jpg)
High-Performance Reconfigurable Computing Group
University of Toronto
A Remote Memory Access Infrastructure for
Global Address Space Programming Models
in FPGAs
Ruediger Willenberg and Paul Chow
February 13, 2013
![Page 2: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/2.jpg)
Parallelizing Computation
• How to partition and communicate data
• How to synchronize computation and data access
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
2
![Page 3: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/3.jpg)
Parallel Programming Models
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
3
![Page 4: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/4.jpg)
Partitioned Global Address Space
• Shared Memory access, but…
• …visible difference between local and remote data
• PGAS allows one-sided communication
• Many PGAS implementations
– Modified standard languages (Unified Parallel C)
– New parallel languages (Chapel)
– Application libraries (Global Arrays)
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
4
![Page 5: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/5.jpg)
Our programming model philosophy
• Use a common API for Software and Hardware
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
5 F
PG
A
Host CPU (x86)
Application
Common API
Network
Drivers
Network
Kernel migration
FPG
A
Host CPU (x86)
Application
Common API
Embedded CPU
Application
Common API
Common API
Hardware
Network
Drivers
Network
Kernel migration
FPG
A
Host CPU (x86)
Application
Common API
Embedded CPU
Application
Common API
Common API
Hardware
Custom
Processing
Element
Common API
Hardware
Network
Drivers
Network
Kernel migration
FPG
A
Host CPU (x86)
Application
Common API
Embedded CPU
Application
Common API
Common API
Hardware
Custom
Processing
Element
Common API
Hardware
Network
Drivers
Network
Kernel migration
![Page 6: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/6.jpg)
Common SW/HW API
• CPU and FPGA components can initiate data
transfers
• SW and HW components use similar call formats
• For distributed memory and message-passing,
this was implemented by TMD-MPI
• Our contribution:
Building hardware infrastructure for a
common API for PGAS
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
6
![Page 7: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/7.jpg)
Rationale
• Why a common API ?
– Easier development: SW Prototyping Migration
– Model makes no distinction between CPUs and FPGAs
– FPGA-initiated communication benefits performance
• Why PGAS instead of Message-passing ?
– Higher programming productivity, easier maintenance
– Performance benefits through one-sided communication
– Shared memory more suitable for certain applications
• Graph-based algorithms, pointer chasing
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
7
![Page 8: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/8.jpg)
GASNet instead of MPI
• Global Address Space Networking
• Software communication library for (P)GAS
• Developed as a communication layer for UPC,
Co-Array Fortran and Titanium (Java)
• GASNet’s core API is built on the concept of
“Active Messages”
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
8
![Page 9: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/9.jpg)
GASNet Active Messages
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
9
![Page 10: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/10.jpg)
GASNet on FPGA: GAScore
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
10
BlockRAM
GAScore
GA
SNet.h
App.c
CPU
FPGA
On-ChipNetwork
![Page 11: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/11.jpg)
GAScore
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
11
BlockRAM
Token
Buffer
Memory
Interface
Transmit
Handlercall
AMrequest
To
Network
From
Network
GAScore
Receive
CPU
![Page 12: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/12.jpg)
GAScore
• Remote memory communication engine (RDMA)
• Controlled through FIFOs (FSLs)
• Configuration parameters are the same as for
GASNet Active Message function calls
– Simple software layer for embedded CPUs
• Independent receive/transmit channels
– No deadlocks
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
12
![Page 13: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/13.jpg)
GASNet on FPGA: GAScore
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
13
BlockRAM
GAScore
GA
SNet.h
App.c
CPU
FPGA
On-ChipNetwork
BlockRAM
GAScore
GA
SNet.h
App.c
CPU
FPGA
On-ChipNetwork
Custom
Hardware
BlockRAM
GAScore
GA
SNet.h
App.c
CPU
FPGA
On-ChipNetwork
Custom
Hardware
PAMS
Custom
Core
![Page 14: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/14.jpg)
Programmable Active Message Sequencer
• Controls Custom Hardware core
• Handles reception/transmission of Active Messages
• Custom core operations and Active Messages are
initiated based on:
– Custom Hardware state
– Number of received messages with a specific code
– Amount of received data
– Timer
• (Re-)programmable through Active Messages
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
14
![Page 15: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/15.jpg)
On-chip GASNet system
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
15
BlockRAM
Gc
BlockRAM
CH GcM
em
ory
Contr
olle
r CH
Gc
CPU
GcOCCC
OC
CCOn-Chip
NetworkDR
AM Host
CPUvia PCIeEthernetQPI
Other FPGAs viaPCIe / Ethernet / Infiniband
FPGA
CPU
![Page 16: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/16.jpg)
CH CH
CH
CH
CH
CH
CH CH CH
CH
CH
CH
OCCC
OC
CC
OC
CC
OC
CC
OCCC OCCC
BlockRAM
µB Gc
BlockRAM
CH Gc
BlockRAM
CH Gc
BlockRAM
CH GcOCCC
OC
CC
Network
Net Net
Net
BlockRAM
CH Gc
BEE3 multi-FPGA platform
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
16
fOp=100MHz
![Page 17: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/17.jpg)
Active Message latencies
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
17
FPGA hops 1-way 2-way
0 17 35
1 24 49
2 31 63
FPGA hops 1-way
(remote write)
2-way
(remote read)
0 29 47
1 36 61
2 43 75
1-word memory transfer latency (cycles)
Short message latency (cycles)
![Page 18: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/18.jpg)
Barrier latencies
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
18
Latency to node 0 87
Latency to node 0 and back 196
Latency to node 0 62
Latency to node 0 and back 148
Simple barrier latency (cycles)
“Staggered” barrier latency (cycles)
![Page 19: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/19.jpg)
Future Work
• Hardware components:
– External memory support
– Strided/vectored memory accesses
– Host interface (PCIe, QPI, HyperTransport)
– Latency/bandwidth optimizations
• Software components:
– GASNet integration on CPU side
– Application benchmarks
– PGAS C++ library for heterogeneous systems
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
19
![Page 20: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/20.jpg)
Heterogeneous C++ PGAS library
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
20
CPU-based
Host
CPU-based
Host GASNet
CPU-based
Host
GASNet Library
Heterogeneous
C++ PGAS Library
C++ PGAS Application C++ generated code
DSL application
Compile Static
generation
Dynamic generation
P A M S
Custom
FPGA
Hardware
Manual
or
HLS
![Page 21: A Remote Memory Access Infrastructure for Global Address ...willenbe/publications/FPGA... · Global Address Space Programming Models in FPGAs Ruediger Willenberg and Paul Chow February](https://reader033.vdocument.in/reader033/viewer/2022050516/5f9ffa676f78dc7cea78d51e/html5/thumbnails/21.jpg)
Thank you for your attention!
Questions ?
February 13, 2013 High-Performance Reconfigurable Computing Group ∙ University of Toronto
21