an#early#prototype#of#an#autonomic# …...operating system distributed framework hardware...

23
An Early Prototype of an Autonomic Performance Environment for Exascale Kevin Huck, Sameer Shende,Allen Malony University of Oregon Hartmut Kaiser Louisiana State University Allan Porterfield, Rob Fowler RENCI Ron Brightwell Sandia NaKonal Labs

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

An  Early  Prototype  of  an  Autonomic  Performance  Environment  for  

Exascale  Kevin  Huck,  Sameer  Shende,Allen  Malony  

University  of  Oregon  Hartmut  Kaiser  

Louisiana  State  University  Allan  Porterfield,  Rob  Fowler  

RENCI  Ron  Brightwell  

Sandia  NaKonal  Labs  

Page 2: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Outline  

•  IntroducKon/MoKvaKon  •  XPRESS  Overview  •  Relevant  Components  – HPX  RunKme  – APEX  Concepts  

•  APEX  Prototype  •  Experimental  Results  •  Future  Work  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   2  

Page 3: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

IntroducKon/MoKvaKon  •  Extreme-­‐scale  compuKng  requires  a  new  perspecKve  on  the  role  of  performance  observaKon  in  the  Exascale  system  soVware  stack  

•  High  concurrency  and  dynamic  operaKon  •  Post-­‐mortem  performance  measurement  and  analysis  is  not  good  enough  

•  Requirements:  –  first-­‐  and  third-­‐person  observaKon  –  In  situ  analysis  –  IntrospecKon  across  stack  layers  –  Dynamic  feedback  and  adaptaKon  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   3  

Page 4: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

XPRESS  •  RunKme  system  implemenKng  ParalleX,  co-­‐designed  with  an  OperaKng  System.  –  Sandia  NaKonal  Laboratories  (Ron  Brightwell)  –  Indiana  University  (Thomas  Sterling,  Andrew  Lumsdane)  –  Lawrence  Berkeley  NaKonal  Laboratory  (Alice  Koniges)  –  Louisiana  State  University  (Hartmut  Kaiser)  –  Oak  Ridge  NaKonal  Laboratory  (Chris  Baker)  –  University  of  Houston  (Barbara  Chapman)  –  University  of  North  Carolina  (Allan  Porterfield,  Rob  Fowler)  –  University  of  Oregon  (Allen  Malony)  

•  h]p://xstack.sandia.gov/xpress/    

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   4  

Page 5: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

LegacyApplications

New Model Applications

MPIMetaprogramming

FrameworkDomain SpecificActive Library

Compiler

AGASname spaceprocessor

LCOdataflow, futuressynchronization

LightweightThreads

context manager

Parcelsmessage driven

computation

...

OpenMP

XPI

Task recognition Address space control

Memory bankcontrol

OSthread In

stru

men

tatio

n

Networkdrivers

Distributed FrameworkOperating System

HardwareArchitecture

OperatingSystem

Instances

RuntimeSystem

Instances

PRIME MEDIUMInterace / Control

{{

Domain SpecificLanguage

+106 nodes ⇥ 103 cores / node + integration network

...“APEX”  

OpenX    

•  Integrated  Exascale  soVware  stack  for  ParalleX  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   5  

Page 6: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

XPRESS:  Exascale  System  SoVware  

•  HPX-­‐4:  runKme  based  on  OpenX  modular  soVware  architecture  – HPX-­‐3:  Current  ParalleX  implementaKon  

•  LXK:  lightweight  kernel  OS  – Based  on  the  Ki]en  OS   Legacy

Applications

New Model Applications

MPIMetaprogramming

FrameworkDomain SpecificActive Library

Compiler

AGASname spaceprocessor

LCOdataflow, futuressynchronization

LightweightThreads

context manager

Parcelsmessage driven

computation

...

OpenMP

XPI

Task recognition Address space control

Memory bankcontrol

OSthread In

stru

men

tatio

n

Networkdrivers

Distributed FrameworkOperating System

HardwareArchitecture

OperatingSystem

Instances

RuntimeSystem

Instances

PRIME MEDIUMInterace / Control

{{

Domain SpecificLanguage

+106 nodes ⇥ 103 cores / node + integration network

...

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   6  

Page 7: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

XPRESS:  Programming  Models  and  Languages  

•  XPI  –  low-­‐level  imperaKve  programming  interface  

•  Meta-­‐programming  framework  to  support  rapid  deployment  of  future  embedded  DSL  formalisms  – Example  DSL  using  meta-­‐DSL  toolkit  

LegacyApplications

New Model Applications

MPIMetaprogramming

FrameworkDomain SpecificActive Library

Compiler

AGASname spaceprocessor

LCOdataflow, futuressynchronization

LightweightThreads

context manager

Parcelsmessage driven

computation

...

OpenMP

XPI

Task recognition Address space control

Memory bankcontrol

OSthread In

stru

men

tatio

n

Networkdrivers

Distributed FrameworkOperating System

HardwareArchitecture

OperatingSystem

Instances

RuntimeSystem

Instances

PRIME MEDIUMInterace / Control

{{

Domain SpecificLanguage

+106 nodes ⇥ 103 cores / node + integration network

...

•  Legacy  miKgaKon  through  MPI  &  OpenMP  programming  interfaces  to  the  OpenX  soVware  stack  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   7  

Page 8: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

XPRESS:  ApplicaKons  

•  Climate  Science  – Community  Earth  System  Model  (CESM)  

•  Nuclear  Energy  – Denovo  

•  Plasma  Physics  – GTC  (HPX  implementaKon)  

•  Trilinos  libraries  

LegacyApplications

New Model Applications

MPIMetaprogramming

FrameworkDomain SpecificActive Library

Compiler

AGASname spaceprocessor

LCOdataflow, futuressynchronization

LightweightThreads

context manager

Parcelsmessage driven

computation

...

OpenMP

XPI

Task recognition Address space control

Memory bankcontrol

OSthread In

stru

men

tatio

n

Networkdrivers

Distributed FrameworkOperating System

HardwareArchitecture

OperatingSystem

Instances

RuntimeSystem

Instances

PRIME MEDIUMInterace / Control

{{

Domain SpecificLanguage

+106 nodes ⇥ 103 cores / node + integration network

...

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   8  

Page 9: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

XPRESS:  Crosscuhng  Issues  

•  AutomaKc  control  and  introspecKon  •  Resilience  –  In  memory  checkpoinKng,  compute-­‐validate-­‐commit  cycle  that  defers  side-­‐effects  (isolaKng  error  propagaKon)  

•  Power  management  – Greater  resource  uKlizaKon  per  Joule  while  reducing  data  movement  through  ac*ve  locality  management  

•  Heterogeneity  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   9  

Page 10: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

HPX  

•  First  open-­‐source  implementaKon  of  ParalleX  •  Modular  runKme  system  for  convenKonal  architectures  (NUMA  machines,  clusters)  

•  Supports  all  key  ParalleX  paradims  – Parcels  &  parcel  transport  layer  – PX-­‐threads  and  management  – Local  Control  Objects  (LCO)  – AcKve  Global  Address  Space  (AGAS)  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   10  

Page 11: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Process Manager Local Memory Management

AGASTranslation

Interconnect

Parcel Port Parcel

Handler

Action Manager Thread

ManagerThread

Pool

LocalControlObjects

Performance Counters

Performance Monitor

...

HPX-­‐3  architecture  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   11  

Thread  queue  length,  PAPI,  etc.    

Page 12: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

APEX:  Autonomic  Performance  Environment  for  eXascale  

•  Node,  core  count,  heterogeneity  all  increasing  •  Cores  interacKng  through  shared  resources  – Manifested  by  bo]lenecks  and  queuing  at  scarce  resources  both  on  chip  (node)  and  between  nodes  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   12  

LegacyApplications

New Model Applications

MPIMetaprogramming

FrameworkDomain SpecificActive Library

Compiler

AGASname spaceprocessor

LCOdataflow, futuressynchronization

LightweightThreads

context manager

Parcelsmessage driven

computation

...

OpenMP

XPI

Task recognition Address space control

Memory bankcontrol

OSthread In

stru

men

tatio

n

Networkdrivers

Distributed FrameworkOperating System

HardwareArchitecture

OperatingSystem

Instances

RuntimeSystem

Instances

PRIME MEDIUMInterace / Control

{{

Domain SpecificLanguage

+106 nodes ⇥ 103 cores / node + integration network

...

•  Measurement  and  analysis  need  to  be  available  in  real  Kme  to  all  levels  of  soVware  stack  

Page 13: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

APEX:  Autonomic  Performance  Environment  for  eXascale  

•  Performance  observaKon  requirements  had  been  saKsfied  by  local  measurements  and  offline  opKmizaKon  (focus  on  first  person)    

•  Inadequate  for  exascale  –  Shared  resources  operate  independently  of  the  cores  that  have  hardware  monitors  built  in  

•  Requirement  for  third  person  observaKon  •  Make  node-­‐wide  resource  uKlizaKon  data  and  analysis,  energy  consumpKon,  health  available  in  real  Kme  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   13  

Page 14: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Autonomic  Performance  

•  Top  down  and  bo]om  up  performance  mapping  •  OS  (LXK)  tracks  system  resource  assignment,  uKlizaKon,  job  contenKon,  overhead  

•  RunKme  (HPX)  tracks  threads,  queues,  concurrency,  remote  operaKons,  parcels,  memory  management  

•  ParalleX,  DSLs  and  Legacy  codes  allow  language-­‐level  performance  semanKcs  to  be  measured  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   14  

Page 15: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Bo]om-­‐Up  Development  Approach  

•  Performance  introspecKon  across  layers  to  enable  dynamic,  adapKve  operaKon  and  decision  control  

•  Extends  previous  work  on  building  decision  support  instrumentaKon  (RCRToolkit)  for  introspecKve  adapKve  scheduling  

•  Resource  Centric  ReflecKon  •  Daemon  monitors  shared,  non-­‐core  resources  •  Real-­‐Kme  analysis,  raw/processed  data  published  to  shared  memory  region,  clients  subscribe  

•  Display/logging  tool,  adapKve  thread  scheduler  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   15  

Page 16: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Top-­‐Down  Development  Approach  

•  Couples  first  person  and  third  person  performance  views  by  translaKng  applicaKon  context  down  the  OpenX  stack  

•  Associates  the  execuKon  performance  state  back  up  

•  TAU  Performance  System  used  as  base  for  measurement  in  HPX  

•  Wrapper  interposiKon  libraries  for  legacy  measurement,  XPI  measurement  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   16  

Page 17: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

APEX:  monitoring  and  wriKng  to  RCR  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   17  

RCR Blackboard

HW  State OS  State

HPX  State

XPI  State

DSL  or  Legacy  State

APEX Profiles, Traces

RCR Daemon

TAU

APEX:  -­‐  “broadcasts”  performance  state  -­‐  monitors  performance  state  -­‐  “triggers”  an  acKon  

RCR client

Page 18: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

APEX:  Prototype  ImplementaKon  

•  Mapping  of  HPX  performance  counters  to  APEX  (TAU)  counters  

•  Intel®  ITT  InstrumentaKon  of  HPX  runKme  (thread  scheduler)  mapped  to  APEX  Kmers  

•  APEX  as  RCR  client  to  observe  power  use  •  RCR  /  HPX  thread  scheduler  integraKon  underway  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   18  

Page 19: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

HPX  “main”  

HPX  user  level  

HPX  “helper”  OS  threads  

HPX  thread  sc

hedu

ler  loo

p  

GTCX  on  1  node  of  ACISS  cluster  at  UO  *  21  /  84  OS  threads  *  12  cores  (user)  

HPX  Profile  Example  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   19  

Page 20: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

GTC  Profile  with  32  threads,  1  node  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   20  

15.24-­‐19.43%  of  Kme  spent  in  the  thread  scheduler  

Page 21: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   21  

gtc_parKKon_loop_acKon  

thread_scheduler_loop  

Page 22: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Future  Work  

•  DisKnguish  between  HPX  task  execuKon  and  preempKon/migraKon  

•  Performance  data  not  stored  in  ParalleX  model  •  InstrumentaKon  of  HPX  parcel  transport,  local  control  objects,  global  address  space  

•  Expanded  informaKon  sharing  between  components  •  RunKme  analysis  and  disKllaKon  of  wide-­‐scale  performance  data  

•  IntegraKon  with  Ki]en/LXK  OS  •  Event  based  model  for  triggering  runKme  correcKons  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   22  

Page 23: An#Early#Prototype#of#an#Autonomic# …...Operating System Distributed Framework Hardware Architecture Operating System Instances Runtime System Instances PRIME MEDIUM Interace / Control

Acknowledgements  •  Support  for  this  work  was  provided  through  ScienKfic  Discovery  

through  Advanced  CompuKng  (SciDAC)  program  funded  by  U.S.  Department  of  Energy,  Office  of  Science,  Advanced  ScienKfic  CompuKng  Research  (and  Basic  Energy  Sciences/Biological  and  Environmental  Research/High  Energy  Physics/Fusion  Energy  Sciences/Nuclear  Physics)  under  award  numbers  DE-­‐SC0008638,  DE-­‐SC0008704,  DE-­‐  FG02-­‐11ER26050  and  DE-­‐SC0006925.  

•  The  ACISS  cluster  is  supported  by  a  Major  Research  InstrumentaKon  grant  from  the  NaKonal  Science  FoundaKon  (NSF),  Office  of  Cyber  Infrastructure,  grant  number  OCI-­‐0960354.  

•  Sandia  NaKonal  Laboratories  is  a  mulK-­‐program  laboratory  managed  and  operated  by  Sandia  CorporaKon,  a  wholly  owned  subsidiary  of  Lockheed  MarKn  CorporaKon,  for  the  U.S.  DOE’s  NaKonal  Nuclear  Security  AdministraKon  under  contract  DE-­‐  AC04-­‐94AL85000.  

June  10,  2013   ROSS  2013  -­‐  Eugene,  Oregon   23