the path from petascale to exascale hardware and

16
The Path from Petascale to Exascale Hardware and Applica8ons Issues Rick Stevens Argonne Na8onal Laboratory University of Chicago

Upload: others

Post on 17-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

ThePathfromPetascaletoExascaleHardwareandApplica8onsIssues

RickStevensArgonneNa8onalLaboratory

UniversityofChicago

Supercompu8ng&CloudCompu8ng

•  Twodominantmacroarchitecturesdominatelarge‐scale(inten8onal)compu8nginfrastructures(vsembedded&adhoc)

•  Supercompu8ngtypeStructures– Large‐scaleintegratedcoherentsystems– Managedforhighu8liza8onandefficiency

•  EmergingcloudtypeStructures– Large‐scalelooselycoupled,lightlyintegrated– Managedforavailability,throughput,reliability

3

Top500Trends

LookingtoExascale

AThreeStepPathtoExascale

AThreeStepPathtoExascale

TopPinchPoints

•  PowerConsump8on– Proc/mem,I/O,op8cal,memory,delivery

•  Chip‐to‐ChipInterfaceScaling(pin/wirecount)•  Package‐to‐PackageInterfaces(op8cs)•  FaultTolerance(FITratesandFaultManagement)– Reliabilityofirregularlogic,designprac8ce

•  CostPressureinOp8csandMemory

ProgrammingModels:TwentyYearsandCoun8ng

•  Inlarge‐scalescien8ficcompu8ngtodayessen8allyallcodesaremessagepassingbased(CSPandSPMD)

• Mul8coreischallengingthesequen8alpartofCSPbuttherehasnotemergedadominatemodeltoaugmentmessagepassing

QuasiMainstreamProgrammingModels

•  C,Fortran,C++andMPI•  OpenMP,pthreads•  CUDA,RapidMind•  ClearspeedsCn•  PGAS(UPC,CAF,Titanium)•  HPCSLanguages(Chapel,Fortress,X10)•  HPCResearchLanguagesandRun8me•  HLL(ParallelMatlab,GridMathema8ca,etc.)

DriverApplica8ons:BasicScienceandEmerging

NERSC2007RankAbundance

0

0.2

0.4

0.6

0.8

1

1.21 11

21

31

41

51

61

71

81

91

101

111

121

131

141

151

161

171

181

191

201

211

221

231

241

251

261

271

281

291

301

311

321

331

341

351

361

Series1

Top6use20%

Top17use40%

Top40use60%

Top85use80%

<100groupsusetheMajorityoftheCycles

MillionWayConcurrencyToday

•  Lijle’slawdrivenneedforconcurrency– Tocoverlatencyinmemorypath– Func8onofaggregatememorybandwidthandclockspeed–  Independentoftechnologyandarchitecturetofirstorder

•  MainstreamCPUs(e.g.x86,PPC,SPARC)– 8‐16cores,4‐8hardwarethreadspercore,– Totalsystemwith103–105nodes=>32K–12Mthreads– BG/Pexampleat1PF72x4K=300,000(buteachthreadhastodo4ops/clock)=>1.2Mopsperclock

•  GPUbasedcluster(e.g.1000Tesla1Unodes)– 3x128coresx(32‐96)threadspercorex1000nodes=12M–36Mthreads

Exis8ngBodyofParallelSopware

•  Howmanyexis8ngHPCscienceandengineeringcodesscalebeyond1000processors?– Myes8mateisthatitislessthan1000worldwide– TopusersatNERSC,OLCFandALCF<200groups–  ItappearslikelythatthebulkofcyclesonTop500areusedincapacitymodewiththeexcep8onofasiteswithpoliciesthatenforcecapabilityruns

•  Howquicklyarenewcodesbeinggenerated?– Abini8odevelopment– Migra8onandpor8ngfrompreviousgenera8ons

•  Therearedifferentchoicesfacedbylarge‐establishedprojectsandpersonalexplora8onsofnewtechnologies

NumberofProcessorsIntheTop500

Specula8onsonTheShip

•  Provisioningbythekilogramdiscreteunits–  I/Osurfacetovolumeeffects,flexibletopologies,thecomputeristhe

computer•  Reconfigurablehardwarepor8ngsopware

–  Basedonprogrammingmodelsthatareinherentlyparallelandscaleinvarianttoshiptheproblemtoemula8onnotdiscoveryofconcurrency

•  Internallyselfpoweredexternalpowersources–  Metaboliclogic?Photodriven?Betadecay?Accous8c?

•  Longservicelife8me(100yr+,ZeroM)fewyears+maint–  Massivelyredundantcompu8ngelementsembeddedinstructurally

usefulmaterials?•  Adiaba8clogicdissipatorylogic

–  Ambientenvironment,noinfrastructure