an#early#prototype#of#an#autonomic# …...operating system distributed framework hardware...
Post on 30-May-2020
1 Views
Preview:
TRANSCRIPT
An Early Prototype of an Autonomic Performance Environment for
Exascale Kevin Huck, Sameer Shende,Allen Malony
University of Oregon Hartmut Kaiser
Louisiana State University Allan Porterfield, Rob Fowler
RENCI Ron Brightwell
Sandia NaKonal Labs
Outline
• IntroducKon/MoKvaKon • XPRESS Overview • Relevant Components – HPX RunKme – APEX Concepts
• APEX Prototype • Experimental Results • Future Work
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 2
IntroducKon/MoKvaKon • Extreme-‐scale compuKng requires a new perspecKve on the role of performance observaKon in the Exascale system soVware stack
• High concurrency and dynamic operaKon • Post-‐mortem performance measurement and analysis is not good enough
• Requirements: – first-‐ and third-‐person observaKon – In situ analysis – IntrospecKon across stack layers – Dynamic feedback and adaptaKon
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 3
XPRESS • RunKme system implemenKng ParalleX, co-‐designed with an OperaKng System. – Sandia NaKonal Laboratories (Ron Brightwell) – Indiana University (Thomas Sterling, Andrew Lumsdane) – Lawrence Berkeley NaKonal Laboratory (Alice Koniges) – Louisiana State University (Hartmut Kaiser) – Oak Ridge NaKonal Laboratory (Chris Baker) – University of Houston (Barbara Chapman) – University of North Carolina (Allan Porterfield, Rob Fowler) – University of Oregon (Allen Malony)
• h]p://xstack.sandia.gov/xpress/
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 4
LegacyApplications
New Model Applications
MPIMetaprogramming
FrameworkDomain SpecificActive Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
LightweightThreads
context manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address space control
Memory bankcontrol
OSthread In
stru
men
tatio
n
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{{
Domain SpecificLanguage
+106 nodes ⇥ 103 cores / node + integration network
...“APEX”
OpenX
• Integrated Exascale soVware stack for ParalleX
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 5
XPRESS: Exascale System SoVware
• HPX-‐4: runKme based on OpenX modular soVware architecture – HPX-‐3: Current ParalleX implementaKon
• LXK: lightweight kernel OS – Based on the Ki]en OS Legacy
Applications
New Model Applications
MPIMetaprogramming
FrameworkDomain SpecificActive Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
LightweightThreads
context manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address space control
Memory bankcontrol
OSthread In
stru
men
tatio
n
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{{
Domain SpecificLanguage
+106 nodes ⇥ 103 cores / node + integration network
...
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 6
XPRESS: Programming Models and Languages
• XPI – low-‐level imperaKve programming interface
• Meta-‐programming framework to support rapid deployment of future embedded DSL formalisms – Example DSL using meta-‐DSL toolkit
LegacyApplications
New Model Applications
MPIMetaprogramming
FrameworkDomain SpecificActive Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
LightweightThreads
context manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address space control
Memory bankcontrol
OSthread In
stru
men
tatio
n
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{{
Domain SpecificLanguage
+106 nodes ⇥ 103 cores / node + integration network
...
• Legacy miKgaKon through MPI & OpenMP programming interfaces to the OpenX soVware stack
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 7
XPRESS: ApplicaKons
• Climate Science – Community Earth System Model (CESM)
• Nuclear Energy – Denovo
• Plasma Physics – GTC (HPX implementaKon)
• Trilinos libraries
LegacyApplications
New Model Applications
MPIMetaprogramming
FrameworkDomain SpecificActive Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
LightweightThreads
context manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address space control
Memory bankcontrol
OSthread In
stru
men
tatio
n
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{{
Domain SpecificLanguage
+106 nodes ⇥ 103 cores / node + integration network
...
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 8
XPRESS: Crosscuhng Issues
• AutomaKc control and introspecKon • Resilience – In memory checkpoinKng, compute-‐validate-‐commit cycle that defers side-‐effects (isolaKng error propagaKon)
• Power management – Greater resource uKlizaKon per Joule while reducing data movement through ac*ve locality management
• Heterogeneity
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 9
HPX
• First open-‐source implementaKon of ParalleX • Modular runKme system for convenKonal architectures (NUMA machines, clusters)
• Supports all key ParalleX paradims – Parcels & parcel transport layer – PX-‐threads and management – Local Control Objects (LCO) – AcKve Global Address Space (AGAS)
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 10
Process Manager Local Memory Management
AGASTranslation
Interconnect
Parcel Port Parcel
Handler
Action Manager Thread
ManagerThread
Pool
LocalControlObjects
Performance Counters
Performance Monitor
...
HPX-‐3 architecture
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 11
Thread queue length, PAPI, etc.
APEX: Autonomic Performance Environment for eXascale
• Node, core count, heterogeneity all increasing • Cores interacKng through shared resources – Manifested by bo]lenecks and queuing at scarce resources both on chip (node) and between nodes
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 12
LegacyApplications
New Model Applications
MPIMetaprogramming
FrameworkDomain SpecificActive Library
Compiler
AGASname spaceprocessor
LCOdataflow, futuressynchronization
LightweightThreads
context manager
Parcelsmessage driven
computation
...
OpenMP
XPI
Task recognition Address space control
Memory bankcontrol
OSthread In
stru
men
tatio
n
Networkdrivers
Distributed FrameworkOperating System
HardwareArchitecture
OperatingSystem
Instances
RuntimeSystem
Instances
PRIME MEDIUMInterace / Control
{{
Domain SpecificLanguage
+106 nodes ⇥ 103 cores / node + integration network
...
• Measurement and analysis need to be available in real Kme to all levels of soVware stack
APEX: Autonomic Performance Environment for eXascale
• Performance observaKon requirements had been saKsfied by local measurements and offline opKmizaKon (focus on first person)
• Inadequate for exascale – Shared resources operate independently of the cores that have hardware monitors built in
• Requirement for third person observaKon • Make node-‐wide resource uKlizaKon data and analysis, energy consumpKon, health available in real Kme
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 13
Autonomic Performance
• Top down and bo]om up performance mapping • OS (LXK) tracks system resource assignment, uKlizaKon, job contenKon, overhead
• RunKme (HPX) tracks threads, queues, concurrency, remote operaKons, parcels, memory management
• ParalleX, DSLs and Legacy codes allow language-‐level performance semanKcs to be measured
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 14
Bo]om-‐Up Development Approach
• Performance introspecKon across layers to enable dynamic, adapKve operaKon and decision control
• Extends previous work on building decision support instrumentaKon (RCRToolkit) for introspecKve adapKve scheduling
• Resource Centric ReflecKon • Daemon monitors shared, non-‐core resources • Real-‐Kme analysis, raw/processed data published to shared memory region, clients subscribe
• Display/logging tool, adapKve thread scheduler
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 15
Top-‐Down Development Approach
• Couples first person and third person performance views by translaKng applicaKon context down the OpenX stack
• Associates the execuKon performance state back up
• TAU Performance System used as base for measurement in HPX
• Wrapper interposiKon libraries for legacy measurement, XPI measurement
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 16
APEX: monitoring and wriKng to RCR
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 17
RCR Blackboard
HW State OS State
HPX State
XPI State
DSL or Legacy State
APEX Profiles, Traces
RCR Daemon
TAU
APEX: -‐ “broadcasts” performance state -‐ monitors performance state -‐ “triggers” an acKon
RCR client
APEX: Prototype ImplementaKon
• Mapping of HPX performance counters to APEX (TAU) counters
• Intel® ITT InstrumentaKon of HPX runKme (thread scheduler) mapped to APEX Kmers
• APEX as RCR client to observe power use • RCR / HPX thread scheduler integraKon underway
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 18
HPX “main”
HPX user level
HPX “helper” OS threads
HPX thread sc
hedu
ler loo
p
GTCX on 1 node of ACISS cluster at UO * 21 / 84 OS threads * 12 cores (user)
HPX Profile Example
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 19
GTC Profile with 32 threads, 1 node
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 20
15.24-‐19.43% of Kme spent in the thread scheduler
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 21
gtc_parKKon_loop_acKon
thread_scheduler_loop
Future Work
• DisKnguish between HPX task execuKon and preempKon/migraKon
• Performance data not stored in ParalleX model • InstrumentaKon of HPX parcel transport, local control objects, global address space
• Expanded informaKon sharing between components • RunKme analysis and disKllaKon of wide-‐scale performance data
• IntegraKon with Ki]en/LXK OS • Event based model for triggering runKme correcKons
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 22
Acknowledgements • Support for this work was provided through ScienKfic Discovery
through Advanced CompuKng (SciDAC) program funded by U.S. Department of Energy, Office of Science, Advanced ScienKfic CompuKng Research (and Basic Energy Sciences/Biological and Environmental Research/High Energy Physics/Fusion Energy Sciences/Nuclear Physics) under award numbers DE-‐SC0008638, DE-‐SC0008704, DE-‐ FG02-‐11ER26050 and DE-‐SC0006925.
• The ACISS cluster is supported by a Major Research InstrumentaKon grant from the NaKonal Science FoundaKon (NSF), Office of Cyber Infrastructure, grant number OCI-‐0960354.
• Sandia NaKonal Laboratories is a mulK-‐program laboratory managed and operated by Sandia CorporaKon, a wholly owned subsidiary of Lockheed MarKn CorporaKon, for the U.S. DOE’s NaKonal Nuclear Security AdministraKon under contract DE-‐ AC04-‐94AL85000.
June 10, 2013 ROSS 2013 -‐ Eugene, Oregon 23
top related