ibm systems group - scicomp · ibm systems group ibm deep computing strategy. ibm systems group ......
Post on 17-Apr-2020
9 Views
Preview:
TRANSCRIPT
© 2004 IBM Corporation
IBM Systems Group
IBM Deep Computing Strategy
IBM Systems Group
© 2004 IBM Corporation
IBM’s Deep Computing Strategy
Aggressively evolve the POWER-based Deep Computing product line
Develop advanced systems based on loosely coupled clusters
Deliver supercomputing capability with new access models and financial flexibility
Research and overcome obstacles to parallelism and other revolutionary approaches to supercomputing
Solving Problems More Quickly at Lower Cost
IBM Systems Group
© 2004 IBM Corporation
IBM’s Deep Computing Strategy
Aggressively evolve the POWER-based Deep Computing product line
Develop advanced systems based on loosely coupled clusters
Deliver supercomputing capability with new access models and financial flexibility
Research and overcome obstacles to parallelism and other revolutionary approaches to supercomputing
Solving Problems More Quickly at Lower Cost
IBM Systems Group
© 2004 IBM Corporation
ASCI Purple and Blue Gene/L►The two systems will provide over 460TF
Track record for delivering the world's largest production-quality supercomputers►ASCI Blue (3.9 TF) & ASCI White (12.3 TF)►ASCI Pathforward (Federation 4GB Switch)
DARPA’s HPCS initiative►Awarded $53M in funding for phase 2 of
DARPA’s High Productivity Computing Systems initiative
► IBM PERCS program aimed at bringing sustained multi-petaflop performance and autonomic capabilities to commercial supercomputers
Creating the Future of Deep Computing
IBM Systems Group
© 2004 IBM Corporation
IBM Systems – Industry Leadership & ChoiceClusters / Virtualization
e325• 1U Opteron-based
High DensityRack Mount
Large SMP
Scal
e U
p / S
MP
Com
putin
gSc
ale
Up
/ SM
P C
ompu
ting
x335• 1U/2p density• Intel-based
x445• Scalable Nodes• Static LPAR• VMware
p690/p690+• High BW• SSI• LPAR• RAS
High Performance Switch
x382• 2-way Itanium 2
BladeCenterTM
• Denser form factors• Rapid deployment’• Flexible architectures• Switch integration
Linux Cluster (e1350)AIX Clusters (e1600)
p655• High density• POWER4-based
Scale Out / Distributed ComputingScale Out / Distributed Computing
LINUX
IBM Systems Group
© 2004 IBM Corporation
POWER3POWER3 POWER4POWER4 POWER4+POWER4+
POWER2POWER2
PPC 603ePPC 603e PPC 750PPC 750PPC 750FXPPC 750FX
PPC 970PPC 970
PPC 440GXPPC 440GXPPC 440GPPPC 440GP
PPC 401PPC 401 PPC 405GPPPC 405GP
POWER5POWER5
POWER / PowerPC: The Most Scaleable Architecture
Servers
Desk Top
Games
Embedded
© 2004 IBM Corporation
IBM eServer pSeries IBM Confidential
Better Performance12072Floating PointRegisters
389mm
Enhanced Dist. SwitchProcessor Speed½ Proc. Speed
1/10 of processor
Yes
~16GB / sec / Chip
36MB / Private12-way Associative~80 Clock Cycles
10-way Associative1.9MB
4-way AssociativeLRU
POWER5 Design
Better usage of processor resources1 processorProcessor Partitioning
50% more transistors in the same space412mmSize
Better systems through putBetter performance
Distributed Switch½ Proc. Speed½ Proc. Speed
Chip InterconnectTypeIntra MCM data busInter MCM data bus
Better processor utilization40% System improvementNoSimultaneous
Multi-Threading
4X improvementFaster memory access4GB / sec / ChipMemory Bandwidth
Better Cache performance40% improvement
32MB / Shared8-way Associative118 Clock Cycles
L3 Cache
Fewer L2 Cache missesBetter performance
8-way Associative1.44MBL2 cache
Improved L1 Cacheperformance
2-way AssociativeFIFOL1 Cache
BenefitPOWER4 Design
POWER4 / POWER5 Differences
© 2004 IBM Corporation
IBM eServer pSeries IBM Confidential
DCMPOWER5 Enhancements
SMT CoresIntegrated Memory CntrlIntegrated L3 ControllerLarger Caches ( L2 & L3 )Mega BandwidthScalable BusesVirtualization support
L336 MB
GX Bus
MEMORY
POWER5 Design276M tranistors389mm2
Enhanced Distributed Switch
Shared L2(1.9 MB)
MemCtrl
L3Dir Ctl
SMTCore
SMTCore
POWER5 Design
DCM: Dual Chip ModuleUsed in Entry & Midrange systems
MCM – MCMChip - Chip
IBM Systems Group
© 2004 IBM Corporation
POWER4 and POWER5 Storage Hierarchy
Processor speed½ processor speedIntra-MCM data buses
Type Enhanced distributed switchDistributed switch
10-way, LRU8-way, LRUAssociativity, replacement
1.92 MB, 128 B line1.44 MB, 128 B lineCapacity, line size
1024 GB (1 TB) maximum512 GB maximumMemory
½ processor speed½ processor speedInter-MCM data buses
Chip interconnect
12-way, LRU8-way, LRUAssociativity, replacement
36 MB, 256 B line32 MB, 512 B lineCapacity, line size
Off-chip L3 Cache
L2 Cache
POWER5POWER4
IBM Confidential
IBM Systems Group
© 2004 IBM Corporation
Latency and Bandwidth Comparison: POWER4 and POWER5
4x----Chip-chip
POWER5POWER4
up to 2.7x read, 1.3x write220 cycles *351 cycles *Memory
1.5x87 cycles *123 cycles *L3 Cache
Same13 cycles12 cyclesL2 Cache
Same2 cycles2 cyclesData Cache
Same1 cycle access
1 cycle access
Instruction Cache
Same1 cycle access
1 cycle access
GPR, FPR Register Files
POWER4 to POWER5 Bandwidth Increase(Same Frequency)
Latency
* Assumes data accessed from local L3 and locally attached memory
IBM Confidential
IBM Systems Group
© 2004 IBM Corporation
Simultaneous Multi-Threading in POWER5
Each chip appears as a 4-way SMP to softwareProcessor resources optimized for enhanced SMT performanceSoftware controlled thread priorityDynamic feedback of runtime behavior to adjust priorityDynamic switching between single and multithreaded mode
FX0
FX1
FP0
FP1
LS0
LS1
BRX
CRL
Simultaneous Multi-Threading
Thread 0 active Thread 1 active
© 2004 IBM Corporation
IBM eServer pSeries IBM Confidential
SF4 Entry SystemDeskside 4U rack chassis
Rack: 19" x 26"
3Media Bays
OptionalInternal RAID
YesRedundant Cooling
FeatureRedundant Power
40Dynamic LPAR
Yes / 8RIO2 Drawers
Dual Ports10/100/1000Integrated Ethernet
Dual Ultra 320Integrated SCSI5 / 4 slots (Hot Plug)I/O Expansion
8 DASD (Hot Plug)DASD / Bays
Deskside / 4U ( 19” rack )Packaging
1GB – 64GBMemory36MB L3 Cache
1 , 2, 4 – wayPOWER5Architecture
SF4POWER5 SF4
© 2004 IBM Corporation
IBM eServer pSeries IBM Confidential
ML4 Entry SystemDeskside 4U rack chassis
Rack: 19" x 26"
3 (Base System)Media Bays
OptionalInternal RAID
YesRedundant Cooling
YesRedundant Power
40 (Base System)Dynamic LPAR
Yes / 8 (Base System)RIO2 Drawers
Dual Ports10/100/1000
(Base System)Integrated Ethernet
Dual Ultra 320 (Base System)Integrated SCSI
6/5 slots (Hot Plug)I/O Expansion
6 DASD (Base system)DASD / Bays
4U ( 19” rack )Packaging
1GB – 64GB (Base System)Memory*
36MB L3 Cache
2 / 4 – way Base System4 – way Secondary Systems
POWER5Architecture
ML4POWER5 ML4
© 2004 IBM Corporation
IBM eServer pSeries IBM Confidential
POWER5 IH System2.5U rack chassisRack: 24" X 43 " Deep, Full Drawer
IBM
H C R U 6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
IBM
H C R U6
10 Systems / Rack80 Processors / Rack
AIX 5.2 & LinuxOS
HPS 0 - 2 / rackGigabit EthernetCluster Interconnect
YesDynamic LPAR
Yes ( 1/2 or 1 ) / drawer0 - 5 / rackRIO2 Drawers
4 Ports10/100/1000Integrated Ethernet
Ultra 320Integrated SCSI
6 / 4 / 2 PCI-X2 - GX Bus & 2 -RIOI/O Expansion
2 DASD (Hot Plug)DASD / Bays
2U ( 24” rack )10 systems / rackPackaging
2GB – 256GBMemory
288MB (Total)L3 Cache
8 – wayPOWER5Architecture
POWER5 IH
Linux
Planned POWER5 IH
IBM Systems Group
© 2004 IBM Corporation
POWER5 Improves HPC Performance
Higher Sustained-to-Peak FLOPS ratio compared to POWER4Dedicated memory busReduction in L3 and memory latency
Integrated memory controller
Increased rename resources allows higher instruction level parallelism in compute intensive applicationsFast barrier synchronization operationEnhanced data prefetch mechanism
IBM Research
BlueGene/L © 2003 IBM Corporation
BlueGene/L system architecture
IBM Research
BlueGene/L | System Software Overview © 2003 IBM Corporation
BlueGene/L fundamentals
A large number of nodes (65,536)Low-power (20W) nodes for densityHigh floating-point performanceSystem-on-a-chip technology
Nodes interconnected as 64x32x32 three-dimensional torus
Easy to build large systems, as each node connects only to six nearest neighbors – full routing in hardwareBisection bandwidth per node is proportional to n2/n3
Auxiliary networks for I/O and global operations
Applications consist of multiple processes with message passing
Strictly one process/nodeMinimum OS involvement and overhead
IBM Research
BlueGene/L | System Software Overview © 2003 IBM Corporation
BlueGene/L interconnection networks
3 Dimensional TorusInterconnects all compute nodes (65,536)Virtual cut-through hardware routing1.4Gb/s on all 12 node links (2.1 GB/s per node)Communications backbone for computations350/700 GB/s bisection bandwidth
Global TreeOne-to-all broadcast functionalityReduction operations functionality2.8 Gb/s of bandwidth per linkLatency of tree traversal in the order of 2 µsInterconnects all compute and I/O nodes (1024)
Gigabit EthernetIncorporated into every node ASICActive in the I/O nodes (1:64)All external comm. (file I/O, control, user interaction, etc.)
IBM Systems Group
© 2004 IBM Corporation
Applications which map well to ultra-parallel (>10,000 CPU) systemsImportant applications can leverage increased compute/memory ratios
Best applications have high ratios of MIPS & FLOPs vs. memory size & cluster BWIntense, mostly-local interactions between 100,000s to 1,000,000,000s of simple units
e.g., atoms (Protein Folding/Drug Design), air pockets (Weather and Climate), logic gates (VLSI sim), etc.Purely-parallel searches over huge data or parameter spaces
Genome searching, Cryptography, Data mining, etc.
BasicAlgorithms
&NumericalMethods
PetroleumReservoirs
MolecularModelling
BiomolecularDynamics / Protein Folding
RationalDrug Design
Nanotechnology
FractureMechanics
Flows in Porous Media
FluidDynamics
Reaction-Diffusion
Multiphase Flow
Weather and Climate
Structural Mechanics
Seismic Processing
Aerodynamics
VLSIDesign
GenomeProcessing Large-scale
Data MiningCryptography
SymbolicProcessing
Pattern Matching
RasterGraphics
MonteCarlo
DiscreteEvents
N-Body
FourierMethods
GraphTheoretic
Transport
Partial Diff. EQs.
Ordinary Diff. EQs.
Fields
BG/L Applications ClassesBG/L Applications Classes
IBM Systems Group
Project Three Rivers | IBM Confidential | © 2004 IBM Corporation
Target TOP500 ranking for SC2004
100600213.9specialpurpose
1.25 GHzAlpha8192HP#5 ASCI Q
5.2130+< 1.59.8homebuilt2 GHzPOWER2200Apple#6 Big Mac
3045<< 120+commerciallyavailable
2.2 GHzPOWER4564IBM#4 Three Rivers
90108< 220-35(40 Peak)
specialpurpose
2 GHzOpteron10kCRAY#3 Red Storm
350640+5.135.9specialpurpose
500 MHzNEC5120NEC#2 Earth Simulator
?32.8100specialpurpose
750 MHzPOWER64k IBM#1 Blue Gene L (1/2)
CostM$FramesMWLinpack
TFlopsTypeDescriptionProcsMakeSystem
IBM Systems Group
Project Three Rivers | IBM Confidential | © 2004 IBM Corporation
Major Three Rivers Innovative TechnologiesIBM 970 POWER processor
Industry leading 64-bit commodity processor
IBM Blade Center IntegrationRecord cluster densityImproved cluster operating efficiency (power, space, cooling)
Advanced semiconductor technology (CMOS10S)Record price/performance in HPC workloadsRecord system throughput
New dense Myrinet interconnect and new MPI software (MX)Significant reduction in switching hardwareFaster parallel processing
Enterprise scale-out FAStT IBM StorageImproved subsystem cost and reliability
Network root file systemImproved node reliabilityReduced installation and maintenance costs
IBM Systems Group
© 2004 IBM Corporation
IBM to receive $53.3M grant...Develop new generation of SupercomputingMulti-petaflop sustained performance by 2010
petaflop = 1 quadrillion calculations / secIBM Research with consortium of 12 leading universities
Univ. of Il, Univ. of New Mexico, etc.Continuation of IBM Technology Leadership
Processors, systems design, etc.IBM proposal: PERCS
Productive, Easy-to-use, Reliable Computing SystemsHighly adaptable systems that configures hardware & software to match application needsRevolutionary chip technologyNew computer architecture
DARPA award ( Defense Advance Research Projects Agency )
IBM Systems Group
© 2004 IBM Corporation
ASCI Purple and Blue Gene/L►The two systems will provide over 460TF
Track record for delivering the world's largest production-quality supercomputers►ASCI Blue (3.9 TF) & ASCI White (12.3 TF)►ASCI Pathforward (Federation 4GB Switch)
DARPA’s HPCS initiative►Awarded $53M in funding for phase 2 of
DARPA’s High Productivity Computing Systems initiative
► IBM PERCS program aimed at bringing sustained multi-petaflop performance and autonomic capabilities to commercial supercomputers
Creating the Future of Deep Computing
IBM Systems Group
© 2004 IBM Corporation
IBM Systems – Industry Leadership & ChoiceClusters / Virtualization
e325• 1U Opteron-based
High DensityRack Mount
Large SMP
Scal
e U
p / S
MP
Com
putin
gSc
ale
Up
/ SM
P C
ompu
ting
x335• 1U/2p density• Intel-based
x445• Scalable Nodes• Static LPAR• VMware
p690/p690+• High BW• SSI• LPAR• RAS
High Performance Switch
x382• 2-way Itanium 2
BladeCenterTM
• Denser form factors• Rapid deployment’• Flexible architectures• Switch integration
Linux Cluster (e1350)AIX Clusters (e1600)
p655• High density• POWER4-based
Scale Out / Distributed ComputingScale Out / Distributed Computing
LINUX
top related