darma - stanford cs theorytheory.stanford.edu/~aiken/west/talks/darma.pdfamt runtime system...

40
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP DARMA Janine C. Bennett, Jonathan Lifflander, David S. Hollman, Jeremiah Wilke, Hemanth Kolla, Aram Markosyan, Nicole Slattengren, Robert L. Clay (PM) PSAAP-WEST February 22, 2017 Unclassified SAND2017-1430 C

Upload: others

Post on 30-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXXP

DARMAJanineC.Bennett,JonathanLifflander,DavidS.Hollman,JeremiahWilke,Hemanth Kolla,AramMarkosyan,NicoleSlattengren,RobertL.Clay(PM)

PSAAP-WESTFebruary22,2017

UnclassifiedSAND2017-1430C

Page 2: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

WhatisDARMA?

DARMAisaC++abstractionlayerforasynchronousmany-task(AMT)runtimes.

Itprovidesasetofabstractionstofacilitatetheexpressionoftaskingthatmaptoavarietyofunderlying

AMTruntimesystemtechnologies.

Sandia’sATDMprogramisusingDARMAtoinformitstechnicalroadmapfornextgenerationcodes.

2

Page 3: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

2015studytoassessleadingAMTruntimesledtoDARMA

3

§ BroadsurveyofmanyAMTruntimesystems§ DeepdiveonCharm++,Legion,Uintah

§ Programmability: DoesthisruntimeenableefficientexpressionofATDMworkloads?

§ Performance: Howperformantisthisruntimeforourworkloadsoncurrentplatformsandhowwellsuitedisthisruntimetoaddressfuturearchitecturechallenges?

§ Mutability:Whatistheeaseofadoptingthisruntimeandmodifyingittosuitourcodeneeds?

Aim:informSandia’stechnicalroadmapfornextgenerationcodes

Page 4: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

2015studytoassessleadingAMTruntimesledtoDARMA

4

§ Conclusions§ AMTsystemsshowgreatpromise§ GapsinrequirementsforSandia

applications§ Nocommonuser-levelAPIs§ Needforbestpracticesandstandards

§ SurveyrecommendationsledtoDARMA§ C++abstractionlayerforAMTruntimes§ RequirementsdrivenbySandiaATDM

applications§ Asingleuser-levelAPI§ SupportmultipleAMTruntimestobegin

identificationofbestpractices

Aim:informSandia’stechnicalroadmapfornextgenerationcodes

Page 5: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

5

MappingtoavarietyofAMTruntimesystemtechnologies

Page 6: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMAprovidesaunifiedAPItoapplicationdevelopersforexpressingtasks

6

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

MappingtoavarietyofAMTruntimesystemtechnologies

Page 7: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

ApplicationcodeistranslatedintoaseriesofbackendAPIcallstoanAMTruntime

7

Not all runtimes providethe same functionality

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

MappingtoavarietyofAMTruntimesystemtechnologies

Page 8: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

ApplicationcodeistranslatedintoaseriesofbackendAPIcallstoanAMTruntime

8

Challenge: design a back end API that maps to a variety of runtimes

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

MappingtoavarietyofAMTruntimesystemtechnologies

Page 9: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

ConsiderationswhendevelopingabackendAPIthatmapstoavarietyofruntimes

9

§ AMTruntimesoftenoperatewithadirectedacyclicgraph(DAG)§ Capturesrelationshipsbetweenapplicationdataandinter-dependenttasks

d1d2

d3 d4

d5d6 d7

t1 t2 t3

t4 t5

subsetreads

MappingtoavarietyofAMTruntimesystemtechnologies

Page 10: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

ConsiderationswhendevelopingabackendAPIthatmapstoavarietyofruntimes

10

§ AMTruntimesoftenoperatewithadirectedacyclicgraph(DAG)§ Capturesrelationshipsbetweenapplicationdataandinter-dependenttasks

§ DAGscanbeannotatedtocaptureadditionalinformation§ Tasks’read/writeusageofdata§ Taskneedsasubsetofdata

d1d2

d3 d4

d5d6 d7

t1 t2 t3

t4 t5

subsetreads

MappingtoavarietyofAMTruntimesystemtechnologies

Page 11: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

ConsiderationswhendevelopingabackendAPIthatmapstoavarietyofruntimes

11

§ AMTruntimesoftenoperatewithadirectedacyclicgraph(DAG)§ Capturesrelationshipsbetweenapplicationdataandinter-dependenttasks

§ DAGscanbeannotatedtocaptureadditionalinformation§ Tasks’read/writeusageofdata§ Taskneedsasubsetofdata

§ Additionalinformationenablesruntimetoreasonmorecompletelyabout§ Whenandwheretoexecuteatask§ Whethertoloadbalance

§ ExistingruntimesleverageDAGswithvaryingdegreesofannotation

d1d2

d3 d4

d5d6 d7

t1 t2 t3

t4 t5

subsetreads

MappingtoavarietyofAMTruntimesystemtechnologies

Page 12: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMAcapturesdata-taskdependencyinformationandtheruntimebuildsandexecutestheDAG

12

Captures data-task dependency information

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

Runtime controls construction andexecution of the DAG

MappingtoavarietyofAMTruntimesystemtechnologies

Page 13: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

13

Abstractionsthatfacilitatetheexpressionoftasking

Page 14: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMAfrontendabstractionsfordataandtasksareco-designedwithSandiaATDMapplicationscientists

14Abstractionsthatfacilitatetheexpressionoftasking

Provide abstractions to simplifycapturing of data-task dependencies

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

Page 15: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMADataModel

Howaredatacollections/datastructuresdescribed?§ Asynchronoussmartpointerswrapapplicationdata

§ Trackmeta-datausedtobuildandannotatetheDAG– Currentlypermissionsinformation– Subsetting informationunderdevelopment

§ Enableextractionofparallelisminadata-race-freemannerHowaredatapartitioninganddistributionexpressed?§ Thereisanexplicit,hierarchical,logicaldecompositionofdata

§ AccessHandle<T> – Doesnotspanmultiplememoryspaces– Mustbeserializedtobetransferredbetweenmemoryspaces

§ AccessHandleCollection<T, R>– Expressesacollectionofdata– Canbemappedacrossmemoryspacesinascalablemanner

§ Distributionofdataisuptoindividualbackendruntime

15Abstractionsthatfacilitatetheexpressionoftasking

Page 16: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMAControlModel

16

Howisparallelismachieved?§ create_work

§ Ataskthatdoesn’tspanmultipleexecutionspaces§ Sequentialsemantics:theorderandmanner(e.g.,read,write)inwhichdata

(AccessHandle)isuseddetermineswhattasksmay beruninparallel§ create_concurrent_work

§ Scalableabstractiontolaunchacrossdistributedsystems§ Acollectionoftasksthatmustmakesimultaneousforwardprogress§ Sequentialsemanticssupportedacrossdifferenttaskcollectionsbasedon

orderandmannerofAccessHandleCollection usageHowissynchronizationexpressed?§ DARMAdoesnotprovideexplicittemporalsynchronizationabstractions§ DARMAdoes providedatacoordinationabstractions

§ publish/fetch semanticsbetweenparticipantsinataskcollection§ Asynchronouscollectivesbetweenparticipantsinataskcollection

Abstractionsthatfacilitatetheexpressionoftasking

Page 17: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

17

UsingDARMAtoinformSandia’stechnicalroadmap

Page 18: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Currentlytherearethreebackendsinvariousstagesofdevelopment

18

Charm++

Charm++, Threads, HPX(in progress)

Application

DARMA

Runtime

OS/Hardware

Common APIacross runtimes

Common APIacross runtimes

Front End API(Application User)

Translation Layer

Back End API(Specification for Runtime)

Glue Code(Specific to each runtime)

fully distributed development

tool

prototype

ThreadsHPX

UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Page 19: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

2017study:ExploreprogrammabilityandperformanceoftheDARMAapproachinthecontextofATDMcodes

19UsingDARMAtoinformSandia’sATDMtechnicalroadmap

ElectromagneticPlasma Particle-in-cellKernels

MultiscaleProxy

MultiLevelMonteCarloUncertaintyQuantificationProxy

Page 20: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

2017study:ExploreprogrammabilityandperformanceoftheDARMAapproachinthecontextofATDMcodes

§ Kernelsandproxies§ Formbasisforprogrammabilityassessments§ WillbeusedtoexploreperformancecharacteristicsoftheDARMA-

Charm++backend

§ Simplebenchmarksenablestudieson§ Taskgranularity§ Overlapofcommunicationandcomputation§ Runtime-managedloadbalancing

§ TheseearlyresultsarebeingusedtoidentifyandaddressbottlenecksinDARMA-Charm++backendinpreparationforstudieswithkernels/proxies

20UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Page 21: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMA’sprogrammingmodelenablesruntime-managed,measurement-basedloadbalancing

21UsingDARMAtoinformSandia’sATDMtechnicalroadmap

InitialstrongscalingresultswithDARMA-Charm++backendarepromising

2DNewtonianparticlesimulationthatstartshighlyimbalanced.

Page 22: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMA’sprogrammingmodelenablesruntime-managed,measurement-basedloadbalancing

22UsingDARMAtoinformSandia’sATDMtechnicalroadmap

LB LB LB LB

Initial LoadImbalance

Perc

ent U

tiliz

atio

n

Time0s 165s

1.74s/iterBefore LB

1.66s/iterAfter LB

1.70s/iterBefore LB

1.47s/iterBefore LB

1.54s/iterAfter LB

1.41s/iterAfter LB

100%

0%

TheCharm++loadbalancerincrementallyrunsasparticlesmigrateandtheworkdistributionchanges.

Page 23: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Summary:DARMAseekstoacceleratediscoveryofbestpractices

§ Applicationdevelopers§ Useaunifiedinterfacetoexploreavarietyofdifferentruntimesystem

technologies§ DirectlyinformDARMA’suser-levelAPIviaco-design

requirements/feedback

§ Systemsoftwaredevelopers§ Acquireasynthesizedsetofrequirementsviathebackend

specification§ Directlyinformbackendspecificationviaco-designfeedback§ CanexperimentwithproxyapplicationswritteninDARMA

§ SandiaATDMisusingDARMAtoinformitstechnologyroadmapinthecontextofAMTruntimesystems

23

Page 24: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

24

BackupSlides

Page 25: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Asynchronoussmartpointersenableextractionofparallelisminadata-race-freemanner

darma::AccessHandle<T> enforcessequentialsemantics:itusestheorderinwhichdataisaccessedinyourprogramandhowitisaccessed(read/write/etc.)toautomaticallyextractparallelism

25

Permission Level

None

Read

Write

Reduce

Permission Type

SchedulingA task with scheduling permission can create deferred tasks that can accessthe data at the specified permission level.

ImmediateA task with immediate permission can dereference the AccessHandle<T> and use it according to the permission level.

Abstractionsthatfacilitatetheexpressionoftasking

Page 26: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Tasksareannotatedinthecodeviaalambdaorfunctorinterface

Taskscanberecursivelynestedwithineachothertogeneratemoresubtasks

26

C++ Lambdas C++ Functors

darma::create_work( [=]{ /*do some work*/ });

This is the C++ 11 syntax for writing an anonymous function that captures variables by value.

struct MyFun { void operator()(...) { /* do some work */ }};

darma::create_work<MyFun>(...)

Functors are for larger blocks of codethat may be reused and migrated bythe backend to another memory space.

Abstractionsthatfacilitatetheexpressionoftasking

Page 27: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Example:Puttingtasksanddatatogether

27

Example Program

AccessHandle<int> my_data;

darma::create_work([=]{ my_data.set_value(29);});

darma::create_work( reads(my_data), [=]{ cout << my_data.get_value(); });

darma::create_work( reads(my_data), [=]{ cout << my_data.get_value(); });

darma::create_work([=]{ my_data.set_value(31);});

Modifymy_data

Readmy_data

Readmy_data

Modifymy_data

DAG (Directed Acyclic Graph)

These two tasks are concurrentand can be run in parallel by a DARMA backend!

SequentialSemantics

Abstractionsthatfacilitatetheexpressionoftasking

Page 28: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Smartpointercollectionscanbemappedacrossmemoryspacesinascalablemanner

28

AccessHandleCollection<vector<double>, Range1D> mycol = darma::initial_access_collection( index_range = Range1D(10) ); Range1D is a potentially user-defined

(or domain-specific) index range, aC++ object that describes the extents of the collection along with providing a corresponding index class for accessingan element.

Every element in the collectioncontains a vector<double>

mycol

Index 0 Index 1

...Index 9

Each indexed element is anAccessHandle<vector<double>>

AccessHandleCollection<T, R> isanextensiontoAccessHandle<T>thatexpressesacollectionofdata

Abstractionsthatfacilitatetheexpressionoftasking

Page 29: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Taskscanbegroupedintocollectionsthatmakeconcurrentforwardprogresstogether

29

create_concurrent_work<MyFun>( index_range = Range1D(5));

This call to create_concurrent_worklaunches a set of tasks, the size of which is specified by an index range, Range1D, that is passed as an argument. Each element in the task collection is

passed an Index1D within the range, used by the programmer to express communication patterns across elementsin the collection.

struct MyFun { void operator()(Index1D i) { int me = i.value; /* do some work */ }};

Taskcollectionsareascalableabstractiontoefficientlylaunchcommunicatingtasksacrosslarge-scaledistributedsystems

Abstractionsthatfacilitatetheexpressionoftasking

Page 30: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Puttingtaskcollectionsanddatacollectionstogether

30

auto mycol = initial_access_collection( index_range = Range1D(10));

create_concurrent_work<MyFun>( mycol, index_range = Range1D(10));

create_concurrent_work<MyFun>( mycol, index_range = Range1D(10));

Example Program

SequentialSemantics

Modifymycol

Modifymycol

Generated DAG

Scalable GraphRefinement

...

...

Modifymycol

Modifymycol

Index 0 Index 1 Index 9

A mapping must exist between thedata index ranges and task index range.In this case, since the three ranges areidentical in size and type, a one-to-oneidentity map is automatically applied.

Abstractionsthatfacilitatetheexpressionoftasking

Page 31: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Tasksindifferentexecutionstreamscancommunicateviapublish/fetchsemantics

31

Execution Stream A

AccessHandle<int> my_data = initial_access<int>(”my_key”);

darma::create_work([=]{ my_data.set_value(29);});

my_data.publish(version=”a”);

darma::create_work([=]{ my_data.set_value(31);});

Execution Stream B

AccessHandle<int> other_data = read_access(”my_key”, version=”a”);

darma::create_work([=]{ cout << other_data.get_value();});

other_data = nullptr;

Modifymy_data

Modifymy_data

Readmy_data

Copymy_data

Readmy_data

Potential DAG 1

If the read_access is on anothernode it might be send across thenetwork.

Abstractionsthatfacilitatetheexpressionoftasking

Page 32: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Tasksindifferentexecutionstreamscancommunicateviapublish/fetchsemantics

32

Execution Stream A

AccessHandle<int> my_data = initial_access<int>(”my_key”);

darma::create_work([=]{ my_data.set_value(29);});

my_data.publish(version=”a”);

darma::create_work([=]{ my_data.set_value(31);});

Execution Stream B

AccessHandle<int> other_data = read_access(”my_key”, version=”a”);

darma::create_work([=]{ cout << other_data.get_value();});

other_data = nullptr;

Modifymy_data

Modifymy_data

Readmy_data

Readmy_data

Potential DAG 2

If the read_access is on the samenode a back end runtime can generatean alternative DAG without the transfer.

Abstractionsthatfacilitatetheexpressionoftasking

Page 33: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Tasksindifferentexecutionstreamscancommunicateviapublish/fetchsemantics

33

Execution Stream A

AccessHandle<int> my_data = initial_access<int>(”my_key”);

darma::create_work([=]{ my_data.set_value(29);});

my_data.publish(version=”a”);

darma::create_work([=]{ my_data.set_value(31);});

Execution Stream B

AccessHandle<int> other_data = read_access(”my_key”, version=”a”);

darma::create_work([=]{ cout << other_data.get_value();});

other_data = nullptr;

Modifymy_data

Modifymy_data

Readmy_data

Readmy_data

Potential DAG 2

If the read_access is on the samenode a back end runtime can generatean alternative DAG without the transfer.

Abstractionsthatfacilitatetheexpressionoftasking

Page 34: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Amappingbetweendataandtaskcollectionsdeterminesaccesspermissionsbetweentasksanddata

34

auto mycol = initial_access_collection<int>( index_range = Range1D(10));create_concurrent_work<MyFun>( mycol, index_range = Range1D(10));

struct MyFun { void operator()( Index1D i, AccessHandleCollection<int> col ) { int me = i.value, mx = i.max_value; auto my_elm = col[i].local_access(); my_elm.publish(version=”x”); auto neighbor = me-1 < 0 ? mx : me-1; auto other_elm = col[neighbor].read_access(version=”x”); create_work([=]{ cout << “neighbor = ” << other_elm.get_value() << endl; }); }};

Identity map between these data andtasks. Thus, index i has local access todata index i.

Any other index must be read using read_access, which actually may bea remote or local operation dependingon the backend mapping, but is always a deferred operation.

Abstractionsthatfacilitatetheexpressionoftasking

Page 35: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

SandiaATDMapplicationsdriverequirementsanddevelopersplayactiveroleininformingfrontendAPI

§ Applicationfeaturerequests§ Sequentialsemantics§ MPIinteroperability§ Node-levelperformanceportabilitylayerinteroperability(Kokkos)§ Collectives§ Runtime-enabledload-balancingschemes

35Abstractionsthatfacilitatetheexpressionoftasking

Page 36: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Alatency-intolerantbenchmarkhighlightsoverheadsasgrainsizedecreases

UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Atthisscale,eachiterationislessthan5mslong.

Thisbenchmarkhastightsynchronizationeveryiteration.

Page 37: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Increasedasynchronyintheapplicationenablestheruntimetooverlapcommunicationandcomputation

UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Scalabilityimproveswithasynchronousiterations.Requiresonlyminorchangestoapplicationcode.

Page 38: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

DARMA’sprogrammingmodelenablesruntime-managed,measurement-basedloadbalancing

38UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Loadbalancingdoesnotrequirechangestotheapplicationcode.

Page 39: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Stencilbenchmarkisnotlatencytolerantandhighlightsruntimeoverheadswhentask-granularityissmall

UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Atthisscale,eachiterationislessthan5mslong.

Page 40: DARMA - Stanford CS Theorytheory.stanford.edu/~aiken/WEST/Talks/Darma.pdfAMT runtime system technologies. Sandia’s ATDM program is using DARMA to inform its ... HPX Using DARMA to

Increasedasynchronyinapplicationenablesruntimetooverlapcommunicationandcomputation

UsingDARMAtoinformSandia’sATDMtechnicalroadmap

Scalabilityimproveswithasynchronousiterations.RequiresonlyminorchangestoDARMAcode.