reconfigurable computing: a first look at the cray-xd1
DESCRIPTION
Reconfigurable Computing: A First Look at the Cray-XD1. Craig Ulmer. Mitch Sukalski, David Thompson, Rob Armstrong, Curtis Janssen, and Matt Leininger Orgs: 8961 & 8963. September 1, 2004. Outline. Reconfigurable computing refresher Progress update Cray XD1 Architecture - PowerPoint PPT PresentationTRANSCRIPT
Reconfigurable Computing:A First Look at the Cray-XD1
Mitch Sukalski, David Thompson, Rob Armstrong,Curtis Janssen, and Matt Leininger
Orgs: 8961 & 8963
September 1, 2004
Craig Ulmer
Outline
• Reconfigurable computing refresher– Progress update
• Cray XD1– Architecture
– General message passing
– Reconfigurable Computing and the XD1
Reconfigurable Computing Update
Reconfigurable Computing
• Use reconfigurable hardware devices to implement key computations in hardware
double doX( double *a, int n) {int i;double x;
x=0;for(i=0;i<n;i+=3){
x+= a[i] * a[i+1] + a[i+2];…
}…
return x;}
* +
+
a[i] a[i+1]
Z -1
a[i+2]
First Year Progress
• Computation (Underwood SNL/NM)– Double-precision Floating Point Cores
• Communication– Multi-gigabit Transceiver (MGT) interface– Gigabit Ethernet work
• Early application experiments– Simplified isosurfacing– Networked pattern matching
Peak Floating-Point Performance
Core
Single Precision Double Precision
SpeedCores per V2P100-6
Peak Performance
SpeedCores per V2P100-6
Peak Performance
Addition 195 MHz 89 17 GFLOPS 143 MHz 40 5.7 GFLOPS
Multiplication 176 MHz 74 13 GFLOPS 142 MHz 27 3.8 GFLOPS
Division 120 MHz 22 2.6 GFLOPS 98 MHz 6 0.58 GFLOPS
From Underwood’s, “FPGAs vs. CPUs: Trends in Peak Floating-Point Performance,” in FPGA’04
Connecting FPGAs to the Network Fabric
• Modern FPGAs feature multi-gigabit transceivers– Experimented with GigE, Myrinet 2000, and IB– Implemented TCP Offload Engine (TOE) in hardware– Working on OpenTOE and OpenGigE cores
MGTControl
TxIP Header
ARPPing
ARPCache
MAC Framer
Align
CRC
Rx
CRCGT_Ethernet_2
Rocket I/OMGT
Pad
PingReply
CRC
DecodeIncoming Data Queue
TimeoutMonitor
SEQGen
ACKMonitor
CRCGen
ARPReply
Outgoing Data Queue
SNL_OpenTOE
TCP
I/F
Socket
I/F
SNL_OpenGigE
Cray XD1 Overview
NDA Notice
We do have an NDA with Cray Canada
The XD1 we have on loan is an early Beta system
Cray XD1 Overview
• Dense MP system– 12 AMD Opterons on 6 blades
– 6 Xilinx Virtex-II/Pro FPGAs
– InfiniBand-like interconnect
– 6 SATA hard drives
– 4 PCI-X slots
– 3U Rack
Individual Blade
DDRMemory
DDRMemory
RAPNI
Opteron Opteron
RapidArray Fabric(24 4x IB Ports)
* All data rates are aggregates (i.e., 3.2 GB/s = 1.6 GB/s + 1.6 GB/s)
HT: 3.2 GB/s
4xIB: 2 GB/s
HT: 6.4 GB/s
“Einstein”Chip
“HT”: 3.2 GB/s
RAPNI
RapidArray Fabric(24 4x IB Ports)
Message Passing
• MPICH 1.2.5– Latency: 2.25 μs– Bandwidth: 1.3 GB/s
(82% of HT-IB link)
• RapidArray message layer– Open source– MP, RDMA– Global address space
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
1 10 100 1,000 10,000 100,000 1,000,000 10,000,000
Message Size (Bytes)
Ba
nd
wid
th (
Mill
ion
Byt
es/
s)
MPI Bandwidth
PCI-X 133
1.6GB/s HT
System Administration
• Active manager– Synchronize each node’s OS
– Partition blade functionality
– Control access rights
• Embedded processor– Monitors health (heartbeats)
– Can restart nodes
• Issues?
Reconfigurable Computing and the Cray XD1
Connecting to the “Einstein” Accelerator
RAPNI
Host HT
Net IB
HTUser-defined
Circuits
FPGA
HTI/F
FPGAPort
FabricPort
1.6+1.6GB/s
QDR2I/F
QDR2I/F
QDR2I/F
QDR2I/F
2MBSRAM
2MBSRAM
2MBSRAM
2MBSRAM
1.6+1.6GB/s
Example: Random Number Generator
• Monte Carlo app in need of good random numbers– Mersenne twister
• Implemented in FPGA– FPGA pushes to host memory
– 301 vs 101 Million Integers/s
– ~1.2 GB/sNI
CPUHost
Memory
RNG
FPGA
General XD1 Comments
• Reconfigurable computing– FPGA in memory
– Fast local memory
• Other accelerators– ClearSpeed
• Global address space– Opteron limits (40b PA)
• Vendor lock-in– Incompatible network
– All-in-one box?
• Current NI is a bottleneck
• Density vs. Reliability
• Value-added features
Good Not-so-Good
Friendly Users?
• We have a month left on evaluation– Could use feedback from other users
http://cdulmer.ran.sandia.gov/xd1 [email protected]