© 2006 regents university of california. all rights reserved ramp blue status andrew schultz, john...

23
© 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

© 2006 Regents University of California. All Rights Reserved

RAMP Blue Status

Andrew Schultz, John Wawrzynek

June 21, 2006

RAMP MIT Summer Workshop

Page 2: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 2 © 2006 Regents University of California. All Rights Reserved

Contributors• Andrew Schultz • Dave Patterson, and the Spring 2006 CS252 (grad computer architecture) class:

Mitch HarwellDavid Tylman Xiaofen JiangNeelima BalakrishnanKhang TranMatt BrockmeyerMarghoob MohiyuddinJue SunZhangxi TanWei XuGary VoronelLuke Beamer

Page 3: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 3 © 2006 Regents University of California. All Rights Reserved

Outline• Review of project goal and requirements

• RAMP Blue Architecture– Design principles– Processor infrastructure– Network interface and on-chip switch– Double precision floating point– Software support

• Implementation experience

• Future work

Page 4: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 4 © 2006 Regents University of California. All Rights Reserved

Project Goal and Requirements• Goal: 1000 node cluster of MicroBlaze cores running

uClinux and real MPI benchmarks

• Requirements:– Infrastructure to boot uClinux on MicroBlaze cores situated on BEE2

user FPGAs– Double precision floating point unit for real MPI benchmarks– On-chip switch capable of routing packets between FPGAs on and

off module– Port of message passing framework (MPI, UPC, etc.)

Page 5: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 5 © 2006 Regents University of California. All Rights Reserved

2VP70FPGA

2VP70FPGA

2VP70FPGA

2VP70FPGA

2VP70FPGA

• Per-module:– 5 Virtex-IIPro70 FPGAs– 20GB DRAM– 20 10Gbps connections

• Supports 10GigE/Infinibnd • System I/O • Inter-mod connections

• RAMP-blue – maps target MBs to four “user” FPGAs,

and hard PowerPC on “control” FPGA as host maintenance processor.

BEE2 Module Design

Page 6: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 6 © 2006 Regents University of California. All Rights Reserved

Andrew’s Design Principles• KISS: We tried to keep everything simple. Don’t over-engineer the

network, FPU, or infrastructure until we have a working design.

• Share the wealth: Resources are tight and MicroBlazes are wimpy. Share infrastructure such as interchip pins, memory controllers, and even FPUs.

• Cut the fat: Wherever possible take care to remove unnecessary logic and interfaces not required by MicroBlaze in this context.

• FSL everywhere: FSL is simply FIFO based communication (very similar to very basic RAMP channel). Ease routing and provide easy migration to RDL

Page 7: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 7 © 2006 Regents University of California. All Rights Reserved

Processor Interfaces

Page 8: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 8 © 2006 Regents University of California. All Rights Reserved

Console Network• Console network serves several

purposes– Download application/kernel from

control FPGA– Provide terminal to booted uClinux– Network conduit to route packets from

MB to control FPGA (or even off board via 10/100 Ethernet)

• Simple, general purpose, FSL based network with OPB FIFO attachment at PPC

• Linux driver for TTY, char device, and Ethernet abstraction

Page 9: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 9 © 2006 Regents University of California. All Rights Reserved

MB/MB Network Interface

• Current network interface is raw FSL connected directly to a on-chip switch

– Interrupt driven, programmed I/O approach– Simple Linux driver provides Ethernet interface so applications can

utilize network via tradition socket interface– Very inefficient, yet very simple for first network implementation

• Discussion and paper design of second generation network interface

– Direct memory access through direct port to memory controller– Possible RDMA support for UPC as well

Page 10: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 10 © 2006 Regents University of California. All Rights Reserved

On-Chip Switch• Switch provides drop-free transmission

of variable length packets from MB to MB

• Composed of two units: buffer unit and switch

– Buffer unit provides buffering at each hop and address lookup logic

– Switch provides cross-bar connectivity between input ports and output ports and arbitration for each port

• Packets are source routed (currently encapsulated Ethernet packets)

• CRCs are end-to-end, so end-points must manage retransmits or fail-stop

Page 11: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 11 © 2006 Regents University of California. All Rights Reserved

Double Precision FPU

• FPU is treated as a co-processor– Investigation into integrating FPU

with RF as SP FPU does was too complicated and didn’t facilitate sharing

• Operands are transferred via FSL in four instructions, and MB blocks for result

• FPU is highly pipelined so to better utilize it makes sense to share (and saves loads of resources)

Page 12: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 12 © 2006 Regents University of California. All Rights Reserved

Initial FPU Performance

MicroBlaze FP emulation

MicroBlaze DP FPU

Sun 386/250

Mitch Harwell & David Tylman

2D FFT (ffbench) Execution Times

Page 13: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 13 © 2006 Regents University of California. All Rights Reserved

Main Memory• Clusters of MBs share a single physical DIMM (1GB)• Memory is partitioned so each core has its own physical address space

Page 14: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 14 © 2006 Regents University of California. All Rights Reserved

Other Infrastructure• Bootstrapping: Reduced boot-strap block RAM from four to one and fit

simple boot-loader and cache-invalidation code in single, read-only BRAM.

• Peripherals: Remove the OPB bus and port interrupt controller and timer to LMB to save logic. Pending.

• Debugging: Using existing opb_mdm core and JTAG we can use existing debugging infrastructure (i.e. XMD/GDB) to debug up to 8 cores. Group of students also worked on ideas for real time instruction tracing.

Page 15: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 15 © 2006 Regents University of California. All Rights Reserved

Software Support• MBs boots relatively unmodified version of uClinux and runs stably

• MPICH2 has been successfully compiled and run on an XUP test system with a pair MicroBlaze cores

• UPC has also been built and run on a XUP test system

• GCC has been modified to emit instructions that utilize double precision FPU co-processor (when SOFTFPU flag turned on)

• Currently finishing up final modifications to first network driver to allow proper source routing of packets between FPGAs and to other BEE2 boards

Page 16: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 16 © 2006 Regents University of California. All Rights Reserved

RAMP Blue FPGA Floor-plan

DIMM

4

DI

MM

1

DI

MM

2

DIMM

3

XAUI 1 XAUI 2

XAUI 3 XAUI 4

LVCMOS Left

LVCMOS Right

LVCMOS Up

LVCMOS Up

MicroBlazeMicroBlazeMicroBlaze MicroBlaze

MicroBlazeMicroBlazeMicroBlaze MicroBlaze

MicroBlaze MicroBlaze

MicroBlaze MicroBlaze

MicroBlazeMicroBlaze

MicroBlazeMicroBlaze

Memory Arbiter Memory Arbiter

Memory Arbiter Memory Arbiter

GlobalSwitchLocal Switch Local Switch

Page 17: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 17 © 2006 Regents University of California. All Rights Reserved

Implementation Experience

• System with 8 MicroBlaze cores per user FPGA running on the BEE2

– This system has the integrated SP FPU per core, we haven’t yet integrated DP FPU core into this base system, although we expect fewer resources with sharing (each SP FPU is ~1300 slices and the DP FPU is ~2000 slices)

• Early attempts to implement a 16 MicroBlaze system have failed in placement, although there are enough raw resources

– We expect that with some simple floor planning we should be able to reach a 16 core system

Page 18: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 18 © 2006 Regents University of California. All Rights Reserved

8 MicroBlaze Cores(SP FPU each)

4 Memory Controllers

4 XAUI Controllers

4-LUTs:40,625 out of 66,176 61%

FFs:27,085 out of 66,176 40%

BRAMs and MULTs:116 out of 328 35%56 out of 328 17%

Page 19: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 19 © 2006 Regents University of California. All Rights Reserved

Near-term Work• Improve density to get 16 core system

– Analysis of data paths and floor planning should allow us to increase the density of cores since current design does very little deliberate area optimization

– Integrate shared DP FP core• Convert design to RDL

– Present design is XPS only (however it is fully parametrized with embedded TCL to allow fast changes to topology)

– Have version of network switch in RDL, need to wrap the rest in RDL • Improve performance of known bottlenecks

– Second generation NIC with direct memory access to take load off MB– Add buffering of FPU operands to allow single cycle sharing of FPU

Page 20: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 20 © 2006 Regents University of California. All Rights Reserved

Spares

Page 21: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 21 © 2006 Regents University of California. All Rights Reserved

Processor Infrastructure

• Key components are required for each processor– MicroBlaze core– Console interface– Network interface– Floating point unit– Memory interface– Debug port– Miscellaneous infrastructure (timer, interrupt controller)

• Build one and then replicate, connect with on-chip switch

Page 22: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 22 © 2006 Regents University of California. All Rights Reserved

Farther Future Work• System level possibilities?

– Hardware paging of memory (ala VMWare ESX) to better utilize memory capacity and allow content based sharing

– Coherent shared memory between MicroBlazes– More exploration of tracing and system level debugging

• Networking possibilities?– Highly FPGA optimized switch– More complicated routing mechanisms

Page 23: © 2006 Regents University of California. All Rights Reserved RAMP Blue Status Andrew Schultz, John Wawrzynek June 21, 2006 RAMP MIT Summer Workshop

MIT Workshop - RAMP Blue Status 23 © 2006 Regents University of California. All Rights Reserved

Conclusion• Close to functional multi-MB system:

– Successfully provided infrastructure to boot multiple MicroBlaze cores on single FPGA with full uClinux support

– Determined ease of porting and running MPI and UPC on uClinux

– Areas targeted for both performance increase (network interface, FPU integration) and on-chip density