sos7 what will cray do for supercomputing in this decade?
DESCRIPTION
SOS7 What will Cray do for Supercomputing in this Decade?. Asaph Zemach Cray Inc. I am not Burton. Sorry. Where is Burton?. Maybe?. More likely…. An Apology. Cray MTA-2 Accepted by NRL Sept ‘02 UMA Shared Memory Latency Tolerant: 128 contexts in processor. Red Storm for Sandia - PowerPoint PPT PresentationTRANSCRIPT
SOS7
What will Cray do for Supercomputing in this Decade?
Asaph ZemachCray Inc
SOS7 – Durango, CO – March 6 2003Page: 2
An Apology
• I am not Burton. Sorry.• Where is Burton?
Maybe?
More likely…
SOS7 – Durango, CO – March 6 2003Page: 3
What Have You Done For Me Lately?• Cray MTA-2
– Accepted by NRL Sept ‘02– UMA Shared Memory– Latency Tolerant: 128 contexts in processor.
• Red Storm for Sandia– Contract signed Oct ‘02– 10,000 AMD X86-64– High Speed Network
• Cray X1– FCS Dec 31, 2002– Scalable vector MSPs– NUMA Shared Memory
SOS7 – Durango, CO – March 6 2003Page: 4
Cray Products: The Near Future2003
X112.8 GF
35GB/s/p mem BW76GB/s/p cache BW
End of ‘04
X1eTechnology Upgrade
Faster clockDenser Package
Mix&Match with X1
2005+
X2(Blackwidow)
BiggerFaster
Cheaper
2003
Red Storm(Development)
End of ‘04
Red Storm(Install)
Catamount LWKLinux service
AMD 2GHz X86-64
2005
Red StormProduct (?)
Linux ServiceCompute OS?
Synergy?I/O?
SOS7 – Durango, CO – March 6 2003Page: 5
Cray Products: Not So Near Future
• Shared Memory Locales– UMA, NUMA
• Heavy Weight Processors– Multi threading, Vectors, Streams
• PIM (LWP)
2006(?) 2008(?) 2010
CascadeX2eBIGGERFASTER
CHEAPER
X2fBIGGER!!FASTER!!
CHEAPER!!
SOS7 – Durango, CO – March 6 2003Page: 6
SW ControlledData Cache
Cascade LocaleHeavy Weight ProcVector MT Streams
MultithreadedPIM DRAM
MultithreadedPIM DRAM
MultithreadedPIM DRAM
LocaleInterconnect
Router
To other Locales
MultithreadedPIM DRAM
MultithreadedPIM DRAM
MultithreadedPIM DRAM
SOS7 – Durango, CO – March 6 2003Page: 7
HWP
Memory
Generic Data
SomewhatLocalized
Data
HighlyLocalizedData
Cascade: Lazy Localization• Initially all data is
considered generic – equally far from everywhere.
• To improve performance stage generic data near HWP that manipulates it.
• To improve performance even more, partition data between PIMS.
• All data always universally accessible but performance varies.
SOS7 – Durango, CO – March 6 2003Page: 8
Cascade: Software Investigations
• Compiler controlled cache • Compartmentalized OS-es
– Introspection using PIM• Relative Debugging• Abstract locales: virtualize locality
management– What needs to be near what– What can/should be distributed (& how)
SOS7 – Durango, CO – March 6 2003Page: 9
Cascade People
• Burton Smith• David Callahan• Steve Scott
– Cray• Thomas Sterling• Larry Bergman• Hans Zima
– JPL, CalTech
• Jay Brockman• Peter Kogge
– Notre Dame• Bill Daly
– Stanford