blue gene project updatebuphy.bu.edu/~brower/pr/bluegene.pdf · in november 2001, a partnership...
TRANSCRIPT
Blue GeneBlue Gene Project Update Project Update
January 2002January 2002
In December 1999, IBM Research announced a 5 year, $100M US, effort to build a petaflop scale supercomputer to attack problems such as protein folding. The Blue Gene project has two primary goals:
Advance the state of the art of biomolecular simulation. Advance the state of the art in computer design and software for extremely large scale systems.
In November 2001, a partnership with Lawrence Livermore National Laboratory was announced.
The Blue Gene ProjectThe Blue Gene Project
Problem: Changing Workload ParadigmsData-Intensive Workloads drive emerging commercial opportunities.Power consumption, cooling and size increasingly important to users.
Drivers: Data-Intensive Computing & TechnologyGrowth in traditional unstructured data (text, images, graphs, pictures, ...)Enormous increase in complex data types (video, audio, 3D images, ...)Increased dependency on optimized data mngmt, analysis, search and distribution.
Research Opportunity: A new way to build serversIncrease total system performance per cubic ft., per watt & per dollar by two orders of magnitude.Minimize development time and expense via common/reusable cores.Distribute workloads by placing compute power in memory and IO subsystems.Focus on Key Issues: Power Consumption, Sys. Mngmt., Compute Density, Autonomic Operation, Scalability, RAS.
Cellular System ArchitectureCellular System Architecture
Cellular System ProjectsCellular System Projects
Super Dense Server (SDS)Uses industry standard componentsLinux Web Hosting EnvironmentsCluster-in-a-box using 10s-100s of low power processorsWorkload specific functionality per node with shared storage system
Blue Gene / L SystemTarget scientific and emerging commercial apps.Collaboration with LLNL for ~200TF System for S&TC workloads.Extended distributed programming model.Scaleable from small to large systems (2048 proc/rack)Leverages high speed interconnect and system-on-a-chip technologies.
Blue Gene / Petaflop SupercomputerScience and research project
Explore the limits of protein science, computer science & computer arch.Applications - molecular dynamics, complex simulationsSupporting research: Distributed systems management, self-healing, problem determination, fast initialization, visualization, system monitoring, packaging, etc.
Supercomputer Peak PerformanceSupercomputer Peak Performance
1940 1950 1960 1970 1980 1990 2000 2010Year Introduced
1E+2
1E+4
1E+6
1E+8
1E+10
1E+12
1E+14
1E+16
Peak
Spe
ed (f
lops
)
Doubling time = 1.5 yr.
ENIAC (vacuum tubes)UNIVAC
IBM 701 IBM 704IBM 7090 (transistors)
IBM Stretch
CDC 6600 (ICs)CDC 7600
CDC STAR-100 (vectors)CRAY-1
Cyber 205 X-MP2 (parallel vectors)
CRAY-2X-MP4
Y-MP8i860 (MPPs)
ASCI White
Blue Gene / PBlue Gene / L
Blue Pacific
DeltaCM-5 Paragon
NWTASCI Red
ASCI Red
CP-PACS
QCDOCCMOS7SF20 TFLOP
QCDSP0.6 TFLOP
ASCI-Q
30 TFLOP
PPC 630ASCI-White10 TFLOP
PPC 604ASCI-Blue
3.3TFLOP
Perf
orm
ance
ArchitectureSpecific General
BG/LCU-11
180 TFLOP
BG/CCU-11
1000 TFLOP
BG/ PCU-08
1000 TFLOP
Supercomputing LandscapeSupercomputing Landscape
Two cellular computing architecturesBlue Gene/L Blue Gene/C (formerly Cyclops)
Software stackKernels, host, middleware, simulators, OSSelf healing, autonomic computing
Application programMolecular dynamics application softwarePartnerships, external advisory board
Blue Gene Project componentsBlue Gene Project components
Massively parallel architecture applicable to a wide class of problems.
Third generation of low power - high performance super computer QCDSP (600GF based on Texas Instruments DSP C31)
Gordon Bell Prize for Most Cost Effective Supercomputer in '98QCDOC (20TF based on IBM System-on-a-Chip)
Collaboration between Columbia University and IBM ResearchBlue Gene/L (180/360 TF)
Processor architecture included in the optimization
Outstanding price performancePartnership between IBM, ASCI-Trilab and UniversitiesInitial focus on numerically intensive scientific problems
Blue Gene/LBlue Gene/L
2.8/5.6 GF/s4 MB
Chip(2 processors)
CU-11 Design
Board(8 chips, 2x2x2)
Rack(128 boards, 8x8x16)
22.4/44.8 GF/s2.08 GB
2.9/5.7 TF/s266 GB
System(64 cabinets, 32x32x64)
180/360 TF/s16 TB
440 core
440 core
EDRAM
I/O
Blue Gene/L Blue Gene/L
65536 nodes interconnected with three integrated networks
3 Dimensional TorusVirtual cut-through hardware routing to maximize efficiency2.8 Gb/s on all 12 node links (total of 4.2 GB/s per node)Communication backbone134 TB/s total torus interconnect bandwidth
Global TreeOne-to-all or all-all broadcast functionalityArithmetic operations implemented in tree~1.4 GB/s of bandwidth from any node to all other nodes Latency of tree traversal less than 1usec
Blue Gene/L - The NetworksBlue Gene/L - The Networks
EthernetIncorporated into every node ASICDisk I/OHost control, booting and diagnostics
Software environment includes:High performance Scientific KernelMPI-2 subset, defined by usersMath libraries subset, defined by usersCompiler support for DFPU (C, C++, Fortran)Parallel file systemSystem management
External Collaborations: Boston University, Caltech, Columbia University, Oak Ridge National Labs, San Diego Supercomputing Center, Universidad Politecnica de Valencia, University of Edinburgh, University of Maryland,Texas A&M, Tech. Univ. of Vienna...
Blue Gene/L System SoftwareBlue Gene/L System Software
Kernel Services
File I/OCheckPoint
GraphicsNetwork
I/O Node
I/O& RAS
Network
1 Gb/sEthernet
Kernel Services
File I/OCheckPoint
GraphicsNetwork
I/O Node
RAID Svrs
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
RAID
1Tb/s
Kernel Services
Application
MPI C/C++, F95Math
Compute Nodes
Kernel Services
Application
MPI C/C++, F95Math
Compute Nodei
Kernel Services
Application
MPI C/C++, F95Math
Compute Nodei
Kernel Services
Application
MPI C/C++, F95Math
Compute Nodei
64K Nodes
1K I/O Nodes
System Console
Host
BG/L - Operating EnvironmentBG/L - Operating Environment
Blue Gene Science Blue Gene Science
Explore simulation methodologies, analysis tools, and biological systems.
Thermodynamic and kinetic studies of peptide systemsThe free energy landscape for hairpin folding in explicit water, R. Zhou et al. (to appear in PNAS)
Solvation free energy/partition coefficient calculations for peptides in solution using a variety of force fields for comparison with experiment
Explore scientific and technical computing applications in other domains such as climate, materials science, ...
Drivers for ApplicationDrivers for Application
Aggressive machine architecture requires:small memory footprintfine-grained concurrency in application decompositionexploration of reusable frameworks for application development
Biological science program requires:multiple force field supportarchitectural support for algorithmic and methodological investigationsframework to support migration of analysis modules from externally hosted application to the parallel core
Algorithm Scientific Importance
Mapping to BG/L
ArchitectureAb Initio Molecular Dynamics in biology and materials science (JEEP) A A
Three dimensional dislocation dynamics (MICRO3D and PARANOID) A A
Molecular Dynamics(MDCASK code) A A
Kinetic Monte Carlo (BIGMAC) A C
Computational Gene Discovery (MPGSS) A B
Turbulence: Rayleigh-Taylor instability (MIRANDA) A A
Shock Turbulence (AMRH) A A
Turbulence and Instability Modeling(sPPM benchmark code) B A
Hydrodynamic Instability (ALE hydro) A A
BG/L Application to ASCI algorithmsBG/L Application to ASCI algorithms
Blue Matter - a Molecular Dynamics Blue Matter - a Molecular Dynamics CodeCode
Separate MD program into three subpackages (offload function to host where possible):
MD core engine (massively parallel, minimal in size)Setup programs to setup force field assignments, etcAnalysis Tools to analyze MD trajectories, etc
Multiple Force Field SupportCHARMM force field (done)OPLS-AA force field (done)AMBER force field (done)Polarizable Force Field (desired)
Potential Parallelization StrategiesInteraction-basedVolume-basedAtom-based
phenomenon System/sizew/solvent
time scale
time stepcount
beta hairpin kinetics
-hairpin/4000 atoms
5 sec 10**9
peptide thermo. -helix,-hairpin/400
0
0.1-1 s 10**8
protein thermo. 60-100 res./20-30,000
1-10 s 10**9
protein kinetics 60-100 res./20-30,000
500 sec 10**11
Time Scales for Protein Folding Phenomena Time Scales for Protein Folding Phenomena
Simulation CapacitySimulation Capacity
1000 10000 100000System Size (atoms)
1.00E+6
1.00E+7
1.00E+8
1.00E+9
1.00E+10
1.00E+11
1.00E+12
1.00E+13
time
step
s/m
onth
1 rack Power3 ('01)512 node BG/L partition (2H03)
40*512 node BG/L partition (4Q04)1,000,000 GFLOP/second (2H06)
1E+3 1E+4 1E+5System Size (atoms)
1.000E+12
1.000E+13
1.000E+14
1.000E+15
1.000E+16
1.000E+17
1.000E+18
byte
s/m
onth
data volume/month (1 rack Power3)data volume/month (512 node BL)data volume/month (40*512 node BL)data volume/month (1,000,000 GLOP/s)
Data Volumes (assuming every Data Volumes (assuming every time-step written out)time-step written out)
External Scientific InteractionsExternal Scientific Interactions
First Blue Gene Protein Science workshop held at San Diego Supercomputer Center, March 2001Second Blue Gene Protein Science workshop to be held at the Maxwell Institute, U. of Edinburgh, in March 2002Collaborations with ORNL, Columbia, UPenn, Maryland, Stanford, ETH-Zurich, ...Blue Gene seminar series has hosted over 25 speakers at the T.J. Watson Research CenterBlue Gene Applications Advisory Board formed with 15 members from the external scientific and HPC communities.
Blue GeneBlue Gene Project Update Project Update
January 2002January 2002