1 retreat (advance) john wawrzynek uc berkeley january 15, 2009
TRANSCRIPT
1
Retreat (Advance)
John WawrzynekUC Berkeley
January 15, 2009
2
Research Accelerator for Multiple Processors
Rapid design space exploration - A new set of architecture parameters can be tried each day leading to highly efficient (power, cost) designs.
High confidence verification of design specification (conventional software simulators are either too slow or not trustworthy).
An early platform for software development while waiting for machine to be built.
REVIEWREVIEW
Problem with Manycore Processor Design trend: Compilers, operating systems,
architectures not ready for 1000s of CPU per chip
How do we do research on 1000 CPU systems in arch., OS, compilers, apps?
Develop an infrastructure to build cycle-Develop an infrastructure to build cycle-accurate multi-core and many-core accurate multi-core and many-core architecture emulators using FPGAsarchitecture emulators using FPGAs• Not FPGA computingNot FPGA computing• Not a gate-level verification platformNot a gate-level verification platform
3
1.1. Provide infrastructure to support collaboration Provide infrastructure to support collaboration among researchersamong researchers• Standardization of interfacesStandardization of interfaces• Development of reusable modulesDevelopment of reusable modules
2.2. Produce relatively inexpensive FPGA platforms Produce relatively inexpensive FPGA platforms & freely available gateware and software& freely available gateware and software
BEE2 ModuleVirtex 2VPro70
BEE3 ModuleBEE3 ModuleVirtex 5Virtex 5
Start with BEE2 at “RAMP1” platform, Start with BEE2 at “RAMP1” platform, design BEE3 as “RAMP2” platform for design BEE3 as “RAMP2” platform for broad deployment using 3broad deployment using 3rdrd party party
3.3. Provide a set of “target” architecture modelsProvide a set of “target” architecture models• Reference designs for further developmentReference designs for further development• Out-of-the box parallel machines for software Out-of-the box parallel machines for software
developmentdevelopment
Promised “deliverables”Promised “deliverables”
4
Partnerships Co-PIs: Krste Asanovíc (UCB), Derek Chiou (UT Austin), Joel Emer
(MIT/Intel), James Hoe (CMU), Christos Kozyrakis (Stanford), Shih-Lien Lu (Intel), Mark Oskin (Washington), David Patterson (Berkeley), and John Wawrzynek (Berkeley)
RAMP hardware development activity centered at Berkeley Wireless Research Center.
Three year NSF grant for staff (awarded 3/06).
Small random other funding: GSRC, Par lab (UCB), other.
DARPA seedling at UCB.
Major continuing commitment from Xilinx
Collaboration with MSR (Chuck Thacker) on BEE3 FPGA platform.
Sun, IBM contributing processor designs, IBM faculty awards, collaboration.
Co-Pis continue to “meet” bi-weekly in teleconferences, and in person 3-4 times per year.
Some History
Spring 2005 ISCA, “hallway” discussions Serious planning in summer 2005. Fall 2005 NSF proposal submitted. Jan 2006 First RAMP Workshop at UC (Hand-on lab
based on BEE2). March 2006 NSF grant awarded Summer 2006 MIT retreat Winter 2007 UCB retreat Summer 2007 tutorial/workshop/retreat at SD Winter 2008 UCB retreat Summer 2008 Stanford retreat
5
Summer 2008 Retreat In anticipation of a new funding proposal to the NSF (larger
this time) … Evaluated where we where technically in the project,
reevaluated our project goals, and brainstormed on what we needed to do going forward.
Summary:a) Made a big splash in the computer architecture communityb) Had many technical success in proving the concept and
understanding how to build efficient scalable infrastructurec) Had not collaborated as well as we would have likedd) Had less outside adoption that we had expected.
6
Technical Accomplishments Built several FPGA-based processor core and memory models, including
models for novel architectural features such as transactional memory (Stanford/RAMP Red, UT/RAMP White);
Demonstrated a 1008 core parallel processor system (Berkeley/RAMP Blue);
Developed next generation FPGA emulation platform (BEE3) and transferred its design and responsibility for manufacturing to a third party (BEEcube) (MSR, UCB);
Implemented several approaches to standards interfaces for module level interchangeability and sharing (Aports/MIT, RDL/UCB);
Demonstrated approaches to simulation “time dilation” for cycle-accurate simulation (Hasim/MIT, RDL/UCB);
Built several prototype hybrid host systems combining FPGA based emulation with conventional processors and software (Protoflex/CMU, FAST/UTAustin);
Investigated several novel techniques to processor modeling for improved simulation and efficiency and scaling – virtualization/split timing-functional modeling (FAST/UTAustin, Protoflex/CMU).
7
Lessons Motivating New Work (1/2)"Simulation" is better than "emulation": Original idea was to build direct RTL level
implementations of target architectures, starting with available RTL of full processors and peripherals.
We did some of this but discovered: Suitable RTL models rarely exist, once completed, such a
system is not flexible, difficult to scale up. The space of possibilities in emulation techniques
(essentially building “simulators” in FPGAs) is quite broad, and emerged as a first class intellectual pursuit in the computer architecture research community.
In this proposed next phase of the project we plan to aggressively pursue this space.
8
Lessons Motivating New Work (2/2)Generally most parallel systems researchers are well
practiced in using and modifying software simulators but have little experience at the hardware level and are reluctant to dig into low-level hardware details needed to build efficient FPGA-based simulators.
Furthermore, reluctance by most researchers to acquire and maintain large FPGA computing platforms and associated FPGA development tools.
In order to be successful, we believe that we must provide ready to go an easy-to-use simulation models and with easy shared access in the form of online services.
9
Proposed Work 1 Create a set of “standards” and ref. implementations:
module interfaces and communication; simulator user interface; monitoring/debug; power/thermal modeling.
2 Use these specifications and reference implementation to build several full simulators, including ones that model 1K+ processor nodes.
3 Offer RAMP infrastructure (both HW and development software) as an online service.
4 Aggressive outreach to hardware architects, RAMP developers, software developers, with: On-line and live tutorials, and providing all the necessary
resources freely available, with full-time staff support services.
10
11
Project Timeline (2/2)
12
Retreat Agenda, day 1
13
Retreat Agenda, day 2
14
Breakout Topics Simulator interoperability
Which are the important simulators with which to be compatable How should RAMP infrastructure play with other simulators?
RAMP Supported FPGA platforms: Nallatech (FSB), BEE3, … Xinlinx / Altera How do we deal with all the choices Lower cost alternative and what is the impact of platform cost (does it affect the
value prop.) What are the price points.
RAMP and Heterogeneous Architectures Value Prop. of RAMP capabilities: What should you be able to do only on
RAMP? Can’t be done any other way. Functional versus RTL versus simulation. When to do which? How to
transition from one to the other?
15