a reconfigurable accelerator card for high performance computing michael aitken supervisor: prof m....

26
A Reconfigurable Accelerator Card for High Performance Computing Michael Aitken Supervisor: Prof M. Inggs Co-Supervisor: Dr A. Langman

Upload: heather-williamson

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

A Reconfigurable Accelerator Card for High Performance Computing

Michael AitkenSupervisor: Prof M. Inggs

Co-Supervisor: Dr A. Langman

Reconfigurable Co-processing

What is an accelerator card?

The Power of FPGAs

• Example: Virtex 5 – LX330T

– 331,000 Logic Cells– 207,000 Flip-flops– 11.6 Mb hardwired RAM– 960 I/O pins– Theoretical I/O Bandwidth: 960 I/O pins at

800Mbps = 768 Gbps– 24 On-board 3.2Gbps Transceivers giving

153.6 Gbps

Commercial Product – Celoxica RCHTX-XV4 (HPC Accelerator Card)

Cost: R60,000 +

Courtesy: Celoxica

Why do we want our own accelerator?

• An advancement on existing cards:

– Latest FPGAs available (Xilinx Virtex 5 – more logic, faster clock rate)

– Faster I/O interface needed for HPC (1 GE not good enough)

– Faster memory devices available– No: compulsory engineering costs, bundled

software• Core computing concept for the Advanced Computer

Engineering Laboratory at the CHPC.

Methodology

• Background Investigation• Conceptual Design• Design Review and Adjustment• Component Sourcing begins• Schematic Capture• Design Specification & Layout Outsourcing• Gateware development• PCB Fabrication and Assembly• Testing

AMD’s Direct ConnectArchitecture

• Improved Latency• Peripheral HTX device has direct access to system RAM

via DMA

Virtex5 LX110T or

SX95T

HTX

QDRII+ 4-word burst (2Mx18bit)

CX4 Connector

CX4 Connector

Rocket IO

Status LEDS

Virtex5LX50

Clock Generation

Con

fig

XCF16PPROM

QDRII+ 4-word burst (2Mx18bit)

QDRII+ 4-word burst (2Mx18bit)

QDRII+ 4-word burst (2Mx18bit)

QDRII+ 4-word burst (2Mx18bit)

QDRII+ 4-word burst (2Mx18bit)H

yper

Tra

nspo

rt

According to:http://www.xilinx.com/products/

silicon_solutions/proms/pfp/virtex.htm

XAUI (4 x 3.125Gbps)

XAUI (4 x 3.125Gbps)

Status LEDS

JTAG chain

JTA

G c

hain

JTAG chain

JTAG Test Port

Hyp

erT

rans

port

Config

Serial RS232

Conceptual Design

Power

30 – 50 Watts

Over:

6 pin 12V connector or

HTX 12V supply

HierarchyTop Level Schematic

HierarchyLow Level Schematic

Layout DesignMTE, Pune India

Fabricated BoardStreamLineCircuits, CA

Assembled BoardTellumat, Cape Town

QDRII+ Memory Tests

CX4 Connector Test:Differential Probe at 10GSa/s

XAUI TX over 0.5m cable

XAUI TX over 5m cable

CX4 Connector Test:Cable Loopback Test

Status Bits indicate both cores are synchronized to all 4 incoming signals

HTX Interface Test

Testing and Gateware to be done by Nick Thorne

Usage Scenario 1Single Reconfigurable Node

Usage Scenario 2Cluster Configuration 1

Usage Scenario 2Cluster Configuration 2

Related work by other students

• Jane Hewitson – Preparing a FORTRAN processing engine

• Nick Thorne – Preparing a HyperTransport conroller core and drivers

• Brandon Hamiltion – Preparing the BORPH reconfigurable operating system

Questions?