readex – run*me exploita*on of applica*on dynamism for ... · efficient exascale compu*ng ee hpc...

16
READEX – Run*me Exploita*on of Applica*on Dynamism for Energy- efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden

Upload: others

Post on 22-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

READEX – Run*me Exploita*on of Applica*on Dynamism for Energy-efficient eXascale compu*ng EEHPCWG@SC’17

AndreasGocht–TechnischeUniversitätDresden

Page 2: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Overview: Project Partners

•  GrantagreementNo671657•  OfficiallystartedSeptember1st,2015

•  TechnischeUniversitätDresden/ZIH(Coordinator)•  NorwegianUniversityofScienceandTechnology•  TechnischeUniversitätMünchen•  IT4InnovaWons,VSB-TechnicalUniversityofOstrava•  NUIGalway,IrishCentreforHigh-EndCompuWng•  IntelFrance•  Gesellscha]fürnumerischeSimulaWonmbH

2

Page 3: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Project Mo*va*on

ApplicaWonsexhibitdynamicbehaviour•  Changingresourcerequirements•  ComputaWonalcharacterisWcs•  ChangingloadonprocessorsoverWme

3

Page 4: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Overview

READEXcreatesatools-aidedmethodologyforautoma2ctuningofparallelapplicaWons•  DynamicallyadjustsystemparameterstoactualresourcerequirementsJointechnologiesfromEmbeddedSystemsandHPC•  HPC:PTF,Score-P,andHDEEM•  EmbeddedSystems:Systemscenariomethodology

4

Page 5: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Overview

Co-designapproach•  Manualtuningforenergyefficiencyasabaseline•  AutomaWctuningforcomparison

•  ApplicaWons•  PERMONandESPRESO(FETItoolsfromIT4InnovaWons)•  Indeed(GNS)•  CORALbenchmarksuite•  ProxyApps

5

Page 6: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Terminology: Region and Region Instance

6

int main(void) {

// Initialize application int num_iterationr = 2; for (int iter = 1; iter <= num_iterations; iter++) { laplace3D(); reduction(); fftw_execute(); } // Post processing and finalization return 0;

}

Significantregion RunWme

situaWon

Phaseregion

Phase

Page 7: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Terminology: Tuning Parameter

7

int main(void) {

// Initialize application int num_iterationr = 2; for (int iter = 1; iter <= num_iterations; iter++) { laplace3D(); reduction(); fftw_execute(); } // Post processing and finalization return 0;

}

TuningParameter

FREQ=1.5GHz

TuningParameterFREQ=2GHz

Page 8: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Overview: Workflow

1.  InstrumentapplicaWonScore-PprovidesdifferentkindsofinstrumentaWon

2.  DetectdynamismCheckwhetherrunWmesituaWonscouldbenefitfromtuning

3.  DetectenergysavingpotenWalandconfiguraWons(DTA)UsetuningpluginandpowermeasurementinfrastructuretosearchforopWmalconfiguraWonCreatetuningmodel

4.  RunWmeapplicaWontuning(RAT)Applytuningmodel,useopWmalconfiguraWon

8

Page 9: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Current Status

•  ApplicaWoninstrumentaWon,dynamismdetecWon,DTA,andRATareimplemented

•  Promisingresults:

9

[CELLREF]

[CELLREF]

[CELLREF][CELLREF]

[CELLREF]

[CELLREF]

20

30

40

50

60

70

80

90

100

110

EnergywithoutREADEX

EnergywithREADEX

RunWmewithoutREADEX

RunWmewithREADEX

NPBBT.C

[CELLREF]

[CELLREF]

[CELLREF][CELLREF]

[CELLREF]

[CELLREF]

20

30

40

50

60

70

80

90

100

110

EnergywithoutREADEX

EnergywithREADEX

RunWmewithoutREADEX

RunWmewithREADEX

NPBMG.D

Page 10: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Example: NAS-OMP BT.C Benchmark

ComparisonofpowerconsumpWonandrunWmeoftheBTbenchmark

10

WithREADEX

WithoutREADEX

Page 11: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Example: NAS-OMP BT.C Benchmark

Selectedtuningparameter:coreanduncorefrequencyoftheBTbenchmark

11

Page 12: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Example: NAS-OMP MG.D Benchmark

ComparisonofpowerconsumpWonandrunWmeoftheMGbenchmark

12

WithREADEX

WithoutREADEX

Page 13: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Example: NAS-OMP MG.D Benchmark

Selectedtuningparameter:coreanduncorefrequencyoftheMGbenchmark

13

Page 14: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Work in Progress

•  EvaluaWngApplicaWonTuningParameters(ATPs)•  Allowthetuningofprograminternal“decisions”•  ExampleprecondiWonerinESPRESOasolverforFETIproblems:

Precondi2onertype

Numberof

itera2ons

Singleitera2oncostTimeandenergy

Totalsolu2oncostTimeandenergy

NoprecondiWoner 172 130+0ms 32.3+0.00J 21.4s 5.50kJ

WeightfuncWon 100 130+2ms 32.3+0.53J 12.9s 3.28kJ

Lumped 45 130+10ms 32.3+3.86J 6.3s 1.64kJ

LightDirichlet 39 130+10ms 32.3+3.74J 5.5s 1.41kJFullDirichlet(default) 30 130+80ms 32.3+20.6J 6.3s 1.59kJ

Note:130msand32.3J–isabaselineforsingleiteraWoncostwithoutprecondiWoner11.3%energysavingsagainstthedefaultfullDirichletprecondi2oners

14

Page 15: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Work in Progress

•  RunWmeCalibraWon•  AutomaWcallyfindsagoodconfiguraWonforunseenscenarios•  AppliedatrunWme•  MachineLearningtechniques

15

Page 16: READEX – Run*me Exploita*on of Applica*on Dynamism for ... · efficient eXascale compu*ng EE HPC WG @ SC’17 Andreas Gocht – Technische Universität Dresden Overview: Project

Thank you for your aQen*on

Ques2ons?

16