geopm progress updates - eehpcwg.llnl.gov progress updates jonathan eastep ... § power api is a...

19
1 GEOPM Progress Updates Jonathan Eastep [[email protected]] Principal Engineer and PhD 14 November 2017 (G lobal E xtensible O pen P ower M anager) h;ps://geopm.github.io/geopm SC17 BoF: PowerAPI, GEOPM, and Redfish

Upload: hakien

Post on 18-May-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

1

GEOPM Progress Updates

JonathanEastep[[email protected]]PrincipalEngineerandPhD

14November2017

(Global Extensible Open Power Manager) h;ps://geopm.github.io/geopm

SC17BoF:PowerAPI,GEOPM,andRedfish

Page 2: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

2

§  PowerAPIisaspecificaPonforpowermonitoringandcontrolinterfaces§  ProposescommoninterfacesforinteroperabilitybetweenpowermgmtimplementaPons

§  RedfishisaspecificaPonfordatacentermanagement§  ProvidesconvenientRESTfulinterfaceforpowermonitoring,control,andbroaderdata

centermanagementfuncPons

§  GEOPMisarunPmeforpowermanagement§  Implementsmonitoringandcontrol,andimportantly:op#mizesjobpower/

performance§  WouldsitunderPowerAPI/Redfishtoimplementrelevantpowercontrolsandmonitors

§  OngoingcollaboraPonbetweenGEOPM,PowerAPI,andRedfish§  RedfishandPowerAPIworkingtowardcompaPbility§  PowerAPIandGEOPMhavecompaPbilityintheirapp-facinginterfaces(mostly)

§  WouldlovetoseecommunityPowerAPI/RedfishimplementaPonsusingGEOPM

Synergies Between Power API, GEOPM, and Redfish

Page 3: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

3

§  RunPmeforin-bandpowermanagementandopPmizaPon§  On-the-flymonitoringofHWcounters&applicaPonprofiling§  Feedback-guidedopPmizaPonofHWcontrolknobse`ngs

§  Opensourcesoaware(flexibleBSDthreeclauselicense)

§  Extensiblethroughpluginarchitecture§  AddnewenergyopPmizaPonstrategies§  Addsupportfornewarchitecturesbeyondx86(trulyopen)

§  DesignedforholisPcopPmizaPon§  Job-wideglobalopPmizaPonofHWcontrolknobse`ngs§  ApplicaPon-awarenessformaxspeeduporenergysavings

§  Scalableviadistributedtree-hierarchicaldesign,algorithms

MPI Comms Overlay Shared Mem Region

Power-Aware RM / Scheduler

GEOPM Controller

SHM

GEOPM

GEOPMRoot

GEOPMAggregator

GEOPMAggregator

GEOPMLeaf

Msr-safe (or Other Drivers for Non-x86 PlaRorms)

MSR

MPI Ranks 0 to i-1

GEOPMLeaf

Processor

MPI Ranks i to j-1

Processor

MPI Ranks j to k-1

GEOPMLeaf

Processor

MPI Ranks k to n-1

GEOPMLeaf

Processor

Projecturl:hfp://geopm.github.io/geopmContact:[email protected]

Page 4: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

4

§  Turn-key(requiresnoappannotaPon):§  AutomaPconlinejobprofiling

§  Node-level:tracesamplesofprocessorcountersandcorrelateHWacPvitytoeachOpenMPparallelregion

§  Job-level:aggregatetheenergycountersacrossalljobcomputenodestomonitoroveralljobpowerorenergy

§  AutomaPcofflineoronlineopPmizaPon§  Willtalkmoreaboutthistoday

§  OfflinevisualizaPonofprofiledata§  Pythonscriptsleveragingpandasfordataanalysis§  Helpfulfordebuggingnewpluginsorunderstanding

howtheyopPmizeenergyorrunPme§  Plottraceofplugindecisionsanddatathey’rebasedon

GEOPM Use Cases

IntelCorporaPon

§  Advanced(requiresusingGEOPMprofilingAPIforappannotaPon):§  AutomaPconlinerebalancing

ofpower&perfamongnodes§  Purpose:acceleratecriPcalpathnodes

inMPIbulk-synchronousapplicaPons§  RefertoISC’17paperonGEOPMby

Eastepetal.formoreinfo§  Note:workinprogresstomakethe

annotaPonautomaPc/turn-keytoo

Page 5: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

5

GEOPM Community (1) Ins#tu#on PrincipalInves#gator Project

NameProjectScope Contribu#on

TypeTimeSpan

QualityLevel

Funded?

Argonne KalyanKumaranVitaliMorozov

CORAL 1.GEOPM1.0productdevelopment Sponsor Q2’15–Q4’17

Product Yes

IBMSTFC-Hartree

VadimElisseevMilosPuzovicNeilMorgan

1.GEOPMporttoPower8+NVLink2.IntegraPonofGEOPMwithEAS

Contributor Q4’16–TBD

Research Yes

LLNL BarryRountreeAniruddhaMarathe

CRADA 1.IntegraPonofGEOPMandConductorrunPmetech2.StudiestomoPvateGEOPM/HWcodesign

Contributor Q3’13–TBD

Research Yes

LLNLArgonneU.ofArizona

TapasyaPatkiPeteBeckmanDaveLowenthal

ECPPSECPArgo-GRM

1.ExascalepowerstackleveragingGEOPM2.IntegraPonofGEOPM+Caliperframework3.IntegraPonofGEOPMwithEAS4.PortofGEOPMtonon-x86architecture

Contributor Q1’17–Q4’19

Near-Product

Yes

LRZ DieterKranzlmüllerHerbertHuberTorstenWilde

1.EnergyopPmizaPonpluginforGEOPM1.02.PowerramplimiPngpluginforGEOPM1.x

Contributor Q3’17–Q4’20

Near-Product

Yes

Sandia JamesLarosRyanGrant

PowerAPI

1.GEOPMandPowerAPIxfacecompaPbility2.PowerAPIcommunityWGkickoffatIntel

User Q4’14-TBD

IndustryStandard

Yes

*

*

*=collaboratorwillbesharingtheirGEOPMusagesandexperiencesatSC17:BoFonPowerAPI,GEOPM,andRedfish

Page 6: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

6

GEOPM Community (2) Ins#tu#on PrincipalInves#gator Project

NameProjectScope Contribu#on

TypeTimeSpan

QualityLevel

Funded?

Argonne KalyanKumaranVitaliMorozovKevinHarms

1.GEOPM>1.0featuredevelopment2.GEOPMenablementforsystempowercapping+EAS3.StudiestomoPvateGEOPM/hardwarecodesign

Sponsor Q1’18–Q4’21

Product WIP

CINECA CarloCavazzoni 1.SystemlevelrunPmeforpowercappingandpowerramplimiPngleveragingGEOPM

Contributor Q2’18–Q1’21

Near-Product

WIPꝉ

IT4I LubomirRiha 1.GEOPMportstoOpenPOWERandARM2.ExtensionstoGEOPMapplicaPonprofiler3.IntegraPonofGEOPMwithEAS

Contributor Q2’18–Q1’21

Near-Product

WIPꝉ

E4 FabrizioMagugliani 1.GEOPMporttoOpenPOWER Contributor Q2’18–Q1’21

Near-Product

WIPꝉ

PNNL LeonSong 1.GEOPMextensionstotunenewHWcontrolknobse`ngs2.GEOPMextensionsforcoordinatedtuningofSWparamsandHWcontrolknobse`ngs

Contributor Q1’19–Q4’20

Research WIPꝉ

ꝉ=leferofintentorequivalentin-hand(non-binding)

Page 7: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

7

GEOPM Release Schedule

AlphaQ2’17

BetaQ2’18

v1.0Q4’18

Commitment:

AlphaQ2’17

BetaQ1’18

v1.0Q2’18

StretchGoal:

TOSS3.x

ISC’18 SC’18

IntelCorporaPon

Announcement:OpenHPCapplicaPonhasbeensubmifed.UnderconsideraPon.

Page 8: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

8IntelCorporaPon

GEOPM Core Team Acknowledgements

HardwareTeam:•  ProcessorFirmware•  RevathyRajasree

• HardwareArchitectureandDesign•  FedeArdanaz•  FuatKeceli•  KellyLivingston•  LowrenLawson

SoawareTeam:• GEOPMDevelopment•  ChrisCantalupo•  DianaGufman•  BradGeltz•  BrandonBaker

•  Research•  SidJana•  AsmaAl-Rawi• MafhiasMaiterth

LeadArchitect:•  JonathanEastep,PrincipalEngineer

Page 9: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

Backup Slides

Page 10: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

10

1.   At-scaleloadimbalanceduetomanufacturingvaria#oninpower-cappedsystems.ThisproblemisdeemedoneofthekeyExascale-erapowerchallenges.DevelopingGEOPMandtechniquestoaddressthisproblemoverthepast6yearsmademeaPrincipalEngineeratIntel.

2.   Gapincommunityenergymanagementresearchtools.Therewaspreviouslynoplayormforenergymanagementresearchthatwasopen,scalable,robust,flexible,portable(trulyopen),andbackedbyseriousengineeringresources.NowthecommunityisusingGEOPM,porPngtonon-x86architectures,integraPngtheiropPmizaPontechniquesintoit,andintegraPngitwithothersoawarecomponents.

3.   Gapinindustryserverpowermanagementroadmapsandtechnicaldirec#ons.Powermanagementwaspreviouslydonenode-locally.Techniqueswereoblivioustoapplica2on-levelinforma2onsuchasboflenecksonremotenodesthatcouldlimitoverallperformanceandwereunabletoforecastwhatcomputaPonwasgoingtohappeninthefutureandopPmizepower-performancepolicyaccordingly.GEOPMaddsacriPcallayerofglobalopPmizaPonacrossnodes,applicaPonandapplicaPonphaseawareness,andforecasPngcapabiliPes.SeeISC’17paperfordemoofbenefits(upto32%speedup).

What Problems Does GEOPM Address?

IntelCorporaPon

Page 11: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

11

Experimental Setup: 3 Inves\ga\ons

IntelCorporaPon

1.  OpportunityAnalysis•  Useproxyapp(modelapplicaPon)todetermineenvelopeofenergy-to-soluPonandPme-to-soluPon

impactwe’llseeoverthelandscapeofBSPapplicaPons•  Measureenergy-to-soluPondecreaseandPme-to-soluPontradeoffrela#vetorunningats#ckeron

theJLSEclusteratArgonne•  Comparetwodifferentuse-casesfortheofflinetechniquewedeveloped:•  ‘OfflineautomaPcapplica&onbest-fit:’allphasesrunatcommonfrequency(best-fitacrossall)•  ‘OfflineautomaPcper-phasebestfit:’eachphaserunsatthebestfrequencyforit

2.  BenchmarkofflineenergyopPmizaPontechnique•  TargetFT,miniFE,andNekboneworkloads•  SameasabovebuttargetslesssynthePcworkloadsandperformsexperimentsonLLNLQuartzsystem

3.  BenchmarkonlineenergyopPmizaPontechnique•  TargettheproxyappandperformexperimentsonJLSEclusteratArgonne•  Comparetheonlineandofflinetechniqueswedeveloped:•  ‘OfflineautomaPcper-phasebest-fit:’scriptsidenPfybestfrequencyviaofflinecharacterizaPon•  ‘OnlineautomaPcper-phasebestfit:’GEOPMpluginperformscharacterizaPon/tuningonline

Page 12: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

12

Results: Opportunity Analysis

BigenergysavingsarepossiblewithfrequencyopPmizaPoninGEOPMvsrunningworkloadsatsPcker:upto16.5%energysavingsat0.3%increasein#me-to-solu#on

Withper-phaseopPmizaPon,energysavingsincreasewithincreasein%Pmeinmemory-limitedphasePer-phaseopPmizaPonsimultaneouslyoffersbeferenergy-to-soluPonANDPme-to-soluPonversus

opPmizingfrequencyacrosstheblendedcharacterisPcsofallapplicaPonphases

0

5

10

15

20

18% 32% 40% 49% 56% 64% 75%

%decreaseinene

rgy-to-soluP

on

%PmeinSTREAMphase

Energy-to-SoluPonDecreaseofflineautoapplicaPonbest-fitofflineautoper-phasebest-fit

-2

0

2

4

6

8

10

18% 32% 40% 49% 56% 64% 75%

%increaseto

Pme-to-soluP

on

%PmeinSTREAMphase

Time-to-SoluPonIncreaseofflineautoapplicaPonbestfitofflineautoper-phasebest-fit

1.1E+09

1.3E+09

1.5E+09

1.7E+09

1.9E+09

2.1E+09

2.3E+09

18% 32% 40% 49% 56% 64% 75%

best-fitfrequ

ency(H

z)

%PmeinSTREAMphase

OfflineAutoAppBest-FitFrequency

DGEMMBest-Fit

STREAMBest-Fit

Page 13: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

13

Results: Offline App vs Per-Phase Best-Fit

IntelCorporaPon

Energy-to-SoluPonandTime-to-SoluPonComparisononQuartzOfflineAutomaPcApplica&onBest-Fit OfflineAutomaPcPer-PhaseBest-Fit

Workload EtSDecreasevsSPcker

TtSIncreasevsSPcker

EtSDecreasevsSPcker

TtSIncreasevsSPcker

FT 9.5% 6.8% 15.8% 4.8%

miniFE 8.5% 5.8% CollecPngdatanow CollecPngdatanow

Nekbone 7.9% 2.4% CollecPngdatanow CollecPngdatanow

ResultsstarPngtoconfirmthatGEOPMprovidesbenefitsforanumberofworkloadsbeyondourproxyapp

Moredataontheway,butdatastarPngtosuggestper-phasefrequencyopPmizaPonsimultaneouslyoffersbeferenergy-to-soluPonANDPme-to-soluPonvsopPmizingfrequencyacrossblendedcharacterisPcsofwholeapp

Page 14: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

14

Results: Online vs Offline Technique

Remember,offlineapproachisbrifle.Thegoal:same(orbefer)resultsviamorerobustonlineapproachWethinkmuchoftheEtSandTtSgapcanbeclosedviaaddressingfrequencylatency&doinglongerrunsFine-tuningneeded,butalreadyseeingpromisingdecreasesinenergy-to-soluPonwithonlineapproach

ExplanaPonofEtSandTtSgaps:•  Runswereshorterthanrealapps

->noPceable“learning”overhead•  Reduced#samplesinlearning

periodtoreduceoverhead->morenoise-relatedcontrolerrors

•  Observedlatencybetweenfrequencychangerequestsandenactment(10sofmilliseconds)->notrunningatdesiredfrequencyimmediately,confusingalgorithm

0

2

4

6

8

10

12

14

16

18

18% 32% 40% 49% 56% 64% 75%%decreaseinene

rgy-to-soluP

on

%PmeinSTREAMphase

Energy-to-SoluPonDecrease

onlineautoper-phasebest-fitofflineautoper-phasebest-fit

-2

0

2

4

6

8

10

18% 32% 40% 49% 56% 64% 75%%increaseto

Pme-to-soluP

on

%PmeinSTREAMphase

Time-to-SoluPonIncrease

onlineautoper-phasebest-fitofflineautoper-phasebestfit

Page 15: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

15

§  SeeGEOPMISC’17paperbyEastepetal.fordetailsofexperimentalsetupandfurtheranalysis§  Paperdemonstratespowerbalancingplugin:itleveragesannotaPonofapplicaPon’soutersynchronizaPonlooptodetect

criPcalpathnodesandthenreallocatespoweramongnodesinordertoequalizetheirPmetocompletealoopiteraPon§  ComparedoverallPme-to-soluPonwhencappingjobpoweron12-nodeKNLclusterwithpowerbalancerplug-invs.staPc

uniformpowerdivision(baseline);sweptoverarangeofdifferentjobpowercaps§  Regionofinterestinjobpowercaps:low-endofjobpowercapswasselectedtoavoidinefficientclockthroflingandthehigh-

endofthejobpowercapsequalstheunconstrainedpowerconsumpPonoftheworkload§  Mainresult:upto30%improvementinPme-to-soluPonatlowendofcaps(miniFE,CoMD,AMG),withupto9-23%forthe

rest.Improvementgenerallyincreasesaspowerismoreconstrained

Results: Inter-Node Power Balancing Use Case

IntelCorporaPon

Page 16: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

16

Results: Four Addi\onal Workloads

IntelCorporaPon

Page 17: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

17

Take-awaypoints:•  Resultsdemonstraterobustnessofpower

balancingalgorithmagainstPme-varyingamountsofworkintheouterloopandsharpshiasincomputaPonal-intensity(topgraphs)

•  Node8,withlowestpowerefficiencyinourKNLcluster,isallocatedmorepower(middlegraphs)

•  PowerbalancingalgorithmimprovescriPcalpathloopPmebyfindingthepowerallocaPonthatroughlyequalizesthefrequenciesofallnodes(bofomgraphs)

GEOPMSpeedupAnalysis(usingincludedGEOPMTraceandPythonVisualizaPonTools)

IntelCorporaPon

Page 18: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

18

§  GEOPMprojectisnotjustasoawareproject.ItalsodrivescodesignofthefeaturesinIntelhardwareforpower-performancemonitoringandcontrol

§  Goalsaretosignificantlyadvancethestate-of-the-artinHPCpowermanagementtechnologyandtoensureGEOPMrunsbestonIntel

§  Researchareas:§  Processor:improvementstogranularity,reacPonPme,andinterfacesforexisPngfeatures§  Processor:hooksforGEOPMtoguideallocaPonofTurboheadroomamongcores§  Memory:hooksforGEOPMtohinttomemcontrollerwhenit’sbesttoenterlow-powerstates§  Network:hooksforGEOPMtoesPmatepower,managetradeoffsbetweenpowerand

bandwidthinHFIandswitches,andhinttoHFIwhenit’sbesttoenterlow-powerstates

Research on GEOPM/HW/FW Codesign

IntelCorporaPon

Page 19: GEOPM Progress Updates - eehpcwg.llnl.gov Progress Updates Jonathan Eastep ... § Power API is a specificaon for power monitoring and control interfaces ... Contributor Q2’18 –

19

§  GEOPMsoawarepackageisopensource,providesarichfeaturesetfreeofcharge

§  IntentisforIntel’sfutureworkonthesoawaretobeopensourceaswell§  3rdparPesareabletomakeproprietaryextensionsofGEOPM(BSD3-clauselicense)

§  EnablesintegratorslikeDell/Cray/HPEtodevelopcommercialfor-profitplugins(i.e.addpowermanagementsecretsaucetodifferenPateyoursystemsvsthecompePPon)

§  GEOPMteamcanhelpintegratorswiththisinaconsulPngcapacity

§  Intelcanexploredevelopingcustomprocessorfirmwareenhancementsforcustomers§  EnablesprocessorpowermanagementfirmwareandGEOPMpluginstobeco-opPmizedfor

individualcustomerneeds§  Enablesmanagementofhardwarecontrolknobse`ngswhicharenot(yet)publicallyavailable§  ProvidingGEOPMNREfundinginasystemcontractisagoodwaytoestablishsuchanengagement

GEOPM New Business Opportuni\es

IntelCorporaPon

InquirewithJonathanEastepformoreinformaPon:[email protected]