loop scheduling in openmp · 2017-11-24 · •loop scheduling in openmp. •a primer of a loop...

24
Loop Scheduling in OpenMP Vivek Kale University of Southern California/Information Sciences Institute

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

LoopSchedulinginOpenMPVivekKale

UniversityofSouthernCalifornia/InformationSciencesInstitute

Page 2: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

Overview• LoopSchedulinginOpenMP.• Aprimerofaloopconstruct• DefinitionsforschedulesforOpenMP loops.

• Aproposalforuser-definedloopscheduleforOpenMP• Needtoallowforrapiddevelopmentofnovelloopschedulingstrategies.• WesuggestgivingusersofOpenMP applicationscontroloftheloopschedulingstrategytodoso.• Wecalltheschemeuser-definedloopschedulingandproposetheschemeasanadditiontoOpenMP.

Page 3: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

OpenMP loops:Aprimer• OpenMP providesaloopconstructthatspecifiesthattheiterationsofoneormoreassociatedloopswillbeexecutedin parallelbythreadsintheteaminthecontextoftheirimplicittasks.1

#pragma omp for [clause[ [,] clause] ... ]for (int i=0; i<100; i++){}

• Loopneedstobeincanonicalform.• Theclause canbeoneormoreofthefollowing:private(…), firstprivate(…), lastprivate(…), linear(…), reduction(…), schedule(…), collapse(...), ordered[…], nowait, allocate(…)

• Wefocusontheclauseschedule(…) inthistalk.1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/

Page 4: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

AScheduleforanOpenMP loop#pragma omp parallel for schedule([modifier [modifier]:]kind[,chunk_size])

• Aschedule inOpenMP is:• aspecificationofhowiterationsofassociatedloopsaredividedintocontiguousnon-emptysubsets• Wecalleachofthecontiguousnon-emptysubsetsachunk

• and howthesechunksaredistributedtothreadsoftheteam.1

• Thesizeofachunk,denotedaschunk_sizemustbeapositiveinteger.

1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/

Page 5: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

TheKindofaSchedule• Aschedulekind ispassedtoanOpenMP loopscheduleclause:• providesahintforhowiterationsofthecorrespondingOpenMP loopshouldbeassignedtothreadsintheteamoftheOpenMP regionsurroundingtheloop.

• FivekindsofschedulesforOpenMP loop1:• static• dynamic• guided• auto• runtime

• TheOpenMP implementationand/orruntimedefineshowtoassignchunkstothreadsofateamgiventhekindofschedulespecifiedbyasahint.

1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/

Page 6: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

ModifiersoftheClauseSchedule• simd:thechunk_size mustbeamultipleofthesimd width.1

• monotonic:Ifathreadexecutediterationi,thenthethreadmustexecuteiterationslargerthani subsequently.1

• non-monotonic:Executionordernotsubjecttothemonotonicrestriction.1

1:OpenMP TechnicalReport6.November2017.http://www.openmp.org/press-release/openmp-tr6/

Page 7: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

NeedNovelLoopSchedulingSchemesinOpenMP• Supercomputerarchitecturesandapplicationsarechanging.• Largenumberofcorespernode.• Speedvariabilityacrosscores.• Complexdynamicbehaviorinapplicationsthemselves.

• So,weneednewmethodsofdistributinganapplication’scomputationalworktoanode’scores1,specificallytoscheduleanapplication’sparallelizedloop’siterationstocores.• Suchmethodsneedto• Ensuredatalocalityandreducesynchronizationoverheadwhilemaintainingloadbalance2.• Beawareofinter-nodeparallelismhandledbylibrariessuchasMPICH3.• Adaptduringanapplication’sexecution.

1:R.D.Blumofe andC.E.Leiserson.SchedulingMultithreadedComputationsbyWorkStealing.JournalofACM46(5):720–748,1999.2:S.Donfack,L.Grigori,W.D.Gropp,andV.Kale.HybridStatic/DynamicSchedulingforAlreadyOptimizedDenseMatrixFactorizations.InIEEEInternationalParallelandDistributedProcessingSymposium,IPDPS2012,Shanghai,China,2012.3:E.Lusk,N.Doss,andA.Skjellum.AHigh-performance,PortableImplementationoftheMessagePassingInterfaceStandard.ParallelComputing,22:789–828,1996.

Page 8: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

UtilityofNovelStrategiesShown• TheutilityofnovelstrategiesisdemonstratedinpublishedworkbyV.Kaleetal 1,2 andothers.• Forexample,static-dynamicschedulingmixedstrategywithanadjustablestaticfraction.• Motivation:tolimittheoverheadofdynamicscheduling,whilehandlingimbalances,suchasthoseduetonoise.

1:S.Donfack,L.Grigori,W.D.Gropp,andV.Kale.HybridStatic/DynamicSchedulingtoImprovePerformanceofAlreadyOptimizedDenseMatrixFactorizations,IPDPS2012.2:V.Kale,S.Donfack,L.Grigori,andW.D.Gropp.LightweightSchedulingforBalancingtheTradeoffBetweenLoadBalanceandLocality2014.

CALUusingstaticscheduling(top)andfd =0.1(bottom)with2-levelblocklayoutrunonAMDOpteron16corenode.

Diagramofstatic(top)andmixedstatic/dynamicscheduling(bottom)wherefdisthedynamicfraction.

13

Scheduling CALU’s Task Dependency Graph• Static scheduling

+ Good locality of data - Ignores OS jitter

Slack MPI

1234

Threads

Tp

1234

1234

Time

Slack

Noise

ThreadsThreads

MPI

MPI

Amplification

t1q(1� fd) · Tp S

White:idletime

Page 9: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

NeedtoSupportAUser-definedScheduleinOpenMP

• PracticeofburyingalloftheschedulingstrategyinsideanOpenMPimplementation,withlittlevisibilityinapplicationcodebeyondtheuseofkeywordssuchas’dynamic’or’guided’,isn’tadequateforthispurpose.• OpenMP’s implementations,e.g.,GCC’slibgomp 1andLLVM’sOpenMPlibrary 2 aredifficulttochange,whichcanhinderdevelopmentofloopschedulingstrategiesatarapidpace.

1. DocumentationforGCC’sOpenMP library.https://gcc.gnu.org/onlinedocs/libgomp/ .2. DocumentationforLLVM’sOpenMP library.http://openmp.llvm.org/Reference.pdf/ .

Page 10: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

ReasonsforUser-definedSchedules• Flexibility.

• GiventhevarietyofOpenMP implementations,havingastandardizedwayofdefiningauser-levelstrategyprovidesflexibilitytoimplementschedulingstrategiesforOpenMP programseasilyandeffectively.

• EmergenceofThreadedRuntimeSystems.• EmergenceofthreadedlibrariessuchasArgobots1 andQuickThreads2 arguesinfavorofaflexiblespecificationofschedulingstrategiesalso.

• Notethatkeywordsauto andruntime aren’tadequate.• Specifyingautoorruntimeschedulesisn’tsufficientbecausetheydon’tallowforuser-levelscheduling.

1. S.Seo,A.Amer,P.Balaji,C.Bordage,G.Bosilca,A.Brooks,A.Castello,D.Genet,T.Herault,P.Jindal,L.Kale,S.Krishnamoorthy,J.Lifflander,H.Lu,E.Meneses,M.Snir,Y.Sun,andP.H.Beckman.Argobots:Alightweightthreadingtaskingframework.2016.

2. D.Keppel.Toolsandtechniquesforbuildingfastportablethreadspackages.TechnicalReportUWCSE93-05-06,UniversityofWashingtonDepartmentofComputerScienceandEngineering,May1993.

Page 11: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

SchedulingCodeoflibgomp• Theschedulingcodeinlibgomp,forexample,supportsthestatic,dynamic,andguidedschedulesnaturally.• However,itscodestructurecan’taccommodatethenumberandsophisticationofthestrategiesthatwewouldliketoexplore.

àAddingauser-definedscheduleintoOpenMP libraries:• ispossiblewitheffortforagivenlibrary.• mustbedonedifferentlyforeachlibrary.

Page 12: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

SpecificationofUser-definedSchedulingScheme

• Weaimtospecifyauser-definedschedulingschemewithintheOpenMPstandard1 .• Theschemeshouldaccommodateanarbitraryuser-definedscheduler.• Thesearetheelementsrequiredtodefineascheduler.• Scheduler-specificdatastructures.• Historyrecord.• Specificationofschedulingbehaviorofthreads.

1. OpenMP ApplicationProgrammingInterface.November2015.

Page 13: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

PotentialSchedulingDataStructures• Thestrategiesshouldbeallowedtouseasubsetorcombinationof:• Shareddatastructures.• Low-overheadsteal-queues.• Exclusivequeuesmeantforeachthread.• Sharedqueuesfromwhichmultiplethreadscandequeue work,eachrepresentingachunkofloopiterations.

Page 14: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

HistoryTrackingfromPriorInvocations1. Tofacilitatetheabilitytolearnfromrecenthistory,e.g.,valuesof

slackinMPIcommunication1,2 frompreviousouteriterations,theschedulingschemeshouldallowforpassingacall-sitespecifichistory-trackingobject3 tothescheduler.

2. Examplesofhistoryinformation,i.e.,attributestotrackviahistoryobjects:• Previousvaluesofdynamicfraction.• Iteration-to-coreaffinities.• Runtimeperformanceprofiles.

1. A.Faraj,P.Patarasuk,andX.Yuan.AStudyofProcessArrivalPatternsforMPICollectiveOperations.InProceedingsofthe2006ACM/IEEEConferenceonSupercomputing,SC’06,Tampa,FL,USA,2006.ACM.

2. B.Rountree,D.K.Lowenthal,B.R.deSupinski,M.Schulz,V.W.Freeh,andT.Bletsch.Adagio:MakingDVSPracticalforComplexHPCApplications.InProceedingsofthe23rdInternationalConferenceonSupercomputing,ICS’09,pages460–469,YorktownHeights,NY,USA.2009.ACM.

3. V.Kale,T.Gamblin,T.Hoefler,B.R.deSupinski andW.D.Gropp.Slack-consciousLightweightLoopSchedulingforScalingPasttheNoiseAmplificationProblem.Poster.SC2012

Page 15: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

Example:LibrarytoSupportStaggeredScheduling• Inearlierwork1,aloopschedulinglibrarythatsupportsa“staggered”schedulingstrategywasimplemented.

• ItwasimplementedwithinanOpenMP parallelregionbyenclosingtheloopbodywithmacrosFORALL_BEGIN()andFORALL_END()withappropriateparameters.

1:V.Kale,A.P.Randles,V.Kale,andW.D.Gropp.Locality-OptimizedSchedulingforImprovedLoadBalancingonSMPs.InProceedingsofthe21stEuropeanMPIUsers’GroupMeetingConferenceonRecentAdvancesintheMessagePassingInterface,volume0,pages1063–1074.AssociationforComputingMachinery,2014.

1.Theschedulingstrategy’snameandassociatedparametersarespecifiedintheparametersofthemacro.2.Bothmacrosinvokelibraryfunctionscorrespondingtothestrategy’snameasspecifiedinthemacrocall.Themacro’sparametersarepassedtotheuser-definedschedulerfunctions.

int start, end = 0;static LoopTimeRecord* ltr;double fd = 0.3; #pragma omp parallel

{ int tid = omp_get_thread_num();int numThrds = omp_get_num_threads();FORALL_BEGIN(sds,tid,numThrds,0,n,start,end,fd)for(int i=start;i<end;i++)c[i] += a[i]*b[i];

FORALL_END(sds,tid,start,end)}

Page 16: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

Example:LibrarytoSupportStaggeredScheduling• Inearlierwork1,aloopschedulinglibrarythatsupportsthis“staggered”schedulingstrategywasimplemented.

• ItwasimplementedwithinanOpenMP parallelregionbyenclosingtheloopbodywithmacrosFORALL_BEGIN()andFORALL_END()withappropriateparameters.

1:V.Kale,A.P.Randles,V.Kale,andW.D.Gropp.Locality-OptimizedSchedulingforImprovedLoadBalancingonSMPs.InProceedingsofthe21stEuropeanMPIUsers’GroupMeetingConferenceonRecentAdvancesintheMessagePassingInterface,volume0,pages1063–1074.AssociationforComputingMachinery,2014.

1.Theschedulingstrategy’snameandassociatedparametersarespecifiedintheparametersofthemacro.2.Bothmacrosinvokelibraryfunctionscorrespondingtothestrategy’snameasspecifiedinthemacrocall.Themacro’sparametersarepassedtotheuser-definedschedulerfunctions.

int start, end = 0;static LoopTimeRecord* ltr;double fd = 0.3; #pragma omp parallel

{ int tid = omp_get_thread_num();int numThrds = omp_get_num_threads();FORALL_BEGIN(sds,tid,numThrds,0,n,start,end,fd)for(int i=start;i<end;i++)c[i] += a[i]*b[i];

FORALL_END(sds,tid,start,end)}

FORALL_BEGIN(strat,…)macroexpansionstrat_##LoopStart(..&start,&end,..);do(loopnotdone){

FORALL_END(strat,…)macroexpansion:}

while(strat_##LoopNext(… &start,&end,..));barrier;

Page 17: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

ProposalforUser-definedScheduleinOpenMP• Weproposeauser-definedschedulingschemethatisanadaptationoftheabovemacro-basedscheme.• Overcomeslimitationsofsimplemacro-basedscheme.

• TheproposedAPIusedforasimplecodeisillustratedbelow.

double dynamicFraction = 0.3;static LoopTimeRecord* ltr; // for history.int chunkSize = 4;#pragma omp parallel for schedule(user, staggered:chunkSize:dynamicFraction:ltr)for(int i = 0; i < n; i++)

c[i] = a[i]*b[i];

• Thefirstparameteroftheclause’schedule’specifiesanewschedulekinduser.• Thesecondparameterspecifiestheschedulingstrategy’snamestaggered,optionallywiththestrategy-specificparameters.

Page 18: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

ImplementationofUser-definedSchedule• Whenauserspecifiesaschedulekinduser andastrategynamedX

• Theyneedtolinkalibrarythatdefinesfunctions:• X_loopStart(), X_loopNext() andX_init().

• X_init()allowsauser-levelschedulertoallocateandinitializeitsdatastructuresthataretobeusedcommonlyacrossparallelloops thatuseX.• ThefunctionsX_loopStart()andX_loopNext()determinealoop’sindicesthatathreadshouldworkonbasedontheparametervaluesfortheschedulingstrategyandoftheloop.

Everythreadexecuteswhenstartinganewloop:…X_loopStart();do X_loopNext();untildone;//doneflagissetbytheuser-defined//schedulerfunctions

• Aslongasoneisallowedtodefinethesefunctions,onecanimplementauser-definedscheduler.

• EverythreadshouldcallX_loopNext() repeatedly.

Page 19: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

SoftwareArchitectureforUser-definedSchedule

#include <mpi.h>

#include <omp-mod.h>

int main(){double dynamicFraction = 0.3;static LoopTimeRecord* ltr; // for history.int chunkSize = 4;while(timestep <1000) {#pragma omp parallel for schedule(user,staggered:chunkSize:dynamicFraction:ltr)for(int i = 0; i < n; i++)

c[i] = a[i]*b[i];}

}

#include <userDefSched.h>

#defineFORALL_BEGIN(strat,…)strat_##LoopStart(..&start,&end,..);do(loopnotdone){

#defineFORALL_END(strat,…)}

while(strat_##LoopNext(… &start,&end,..));barrier();

switch(clause){ ‘static’:‘dynamic’:‘guided’: ‘auto’:‘runtime’:‘user’:

}

strat_##LoopStart(..&start,&end,..);{}

myApplication.Comp-mod.h

sd_Init():- allocateshareddatastructuresforloopsthatusestrategysd;

sd_LoopStart():ifIamthemasterthread,- setupadatastructureloopParams,alongwithalock;- signalotherthreadstostart;

else- waitforthesignalfrommasterthread;

usethesharedloopParams datastructuretocalculatemystaticiterationsandexecuteloop_body forthatrange;

sd_LoopNext():lockloopParams;extractachunktoworkon;unlockloopParams;if(nochunkavailable)waitforbarrier;done=1;

elseexecuteloop_body fortheextractedchunk;

Page 20: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

sd_LoopStart():ifIamthemasterthread,- setupadatastructureloopParams,alongwithalock;- signalotherthreadstostart;

else- waitforthesignalfrommasterthread;

- usethesharedloopParams datastructuretocalculatemystaticiterationsandexecuteloop_body forthatrange;

sd_LoopNext():lockloopParams;extractachunktoworkon;unlockloopParams;if(nochunkavailable)waitforbarrier;done=1;

elseexecuteloop_body fortheextractedchunk;

Static/dynamicSchedulingUsingaSharedDataStructure

Page 21: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

sd2_LoopStart():ifIamthemasterthread

— setupadatastructureloopParams withloopparameters;— enqueue asingleentrycorrespondingtodynamicrangeofiterations intothread0’sstealqueue;— signalotherthreadstostart;

else— waitforthesignalfrommasterthread;

- UsetheshareddatastructuretocalculatemystaticiterationsandexecuteloopBody forthatrange;

sd2_LoopNext():- nextRange =myQueue.dequeue();if(nextRange ==NULL)nextRange =steal(random_neighbor);if(nextRange !=NULL)L=nextRange->low;U=nextRange->high;if((U-L)>Threshold)- splittherangein2, enqueue theminmystealqueue;

else- executeloopbodyforiterationsL:U;- updatecountofiterationscompletedtosetthe“done”flagwhen done;

AnAlternativeImpl.ofStatic/DynamicStrategyUsingStealQueuesforDynamicIterations

Page 22: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

staggered_LoopStart():ifIamthemasterthread

- setupadatastructureloopParams;- signalotherthreadstostart;

else- waitforthesignalfrommasterthread;

- enqueue entriescorrespondingtochunksofmydynamicrangeofiterationsintomythread’sstealqueue;- calculatemystaticiterationsandexecuteloopBody forthatrange;

staggered_LoopNext():nextRange =myQueue.dequeue();if(nextRange ==NULL)nextRange =steal(random_neighbor);if(nextRange !=NULL)- L=nextRange->low;U=nextRange->high;- executeloopbodyformyiterationsL:U;- updatecountofiterationscompletedtosetthe“done”flagwhendone;

AnImplementationofStaggeredStatic/DynamicStrategy

tion. The name of the scheduling strategy and its associated parameters isspecified in the parameter of the macro function. The parameters passedto the macro function are passed to the scheduler functions defined by theuser. This macro-based approach is of limited utility because of its inabilityto use the compiler support for features such as reductions. The approachwe propose eliminates these limitations and leads to concise code.

T0 T1 T2 T3T0 T0 T0 T0T0 T0T0 T1 T1 T1 T1 T1T1T1 T2 T2 T2 T2T2 T2T2

Increasing loop iteration number

T3 T3T3T3T3T3T3

Figure 1: Diagram of the iteration space of a threaded computation regionshowing loop iterations distributed to threads when using the technique ofstaggered mixed static/dynamic scheduling. In the figure, the smaller, redcolored, rectangles correspond to dynamic iterations tentatively assignedto a specific thread.

double static_fraction = 0.5; double constraint =0.2; int

chunk_size = 4;#pragma omp parallel for schedule(user, statdynstaggered:

chunk_size:static_fraction:constraint)for (int i=0; i<n; i++)

sum+= a[i]*b[i];

Figure 2: An illustration of the use of the user-defined scheduler in a dotproduct code parallelized with OpenMP.

We propose the user-defined scheduling scheme to be an adaptation ofthe scheme of using macro-based functions. The schedule kind ’user’ andthe name of the scheduling strategy with the strategy’s parameter value(s),e.g., the name ’staggered’ and the parameter values for the static frac-tion, chunk size and constraint, are specified as parameters of the OpenMP’schedule’ clause. Figure 2 illustrates the proposed API for using a user-defined schedule through the API’s use in the dot product code. The user-defined scheduler named ’staggered’, specified in the second parameter ofthe schedule clause, defines how the work of a loop in the code is enqueuedinto queues or data structures and the strategy of dequeuing (or obtaining)a chunk of loop iterations to execute next using the scheduling strategyillustrated in Figure 1.

When the user specifies a schedule clause with kind ’user’ and a

3

Page 23: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

DiscussionofAPI’sdetails• WehavesketchedaboveasyntaxfortheAPIforuser-levelschedulers.• However,weexpectthattheAPI’sdetailswillbeworkedoutwiththecommunity’sconsensus.

• Notethatthereisaprecedentforaddinguser-definedfunctionsinOpenMP standard.• The combinerfunctioninuser-definedreductions.

Page 24: Loop Scheduling in OpenMP · 2017-11-24 · •Loop Scheduling in OpenMP. •A primer of a loop construct •Definitions for schedules for OpenMPloops. •A proposal for user-defined

Summary• Needforexperimentationwithsophisticatedloopschedulingstrategies.• OpenMP communityshoulddiscusshowtoallowflexiblespecificationofsuchstrategiesinauser’scodeandhowtodesignauser-levelschedulerlibrarysothatitcanbeportablyusedwithanyconformingOpenMP implementation.• Supportinguser-definedschedulersinthiswaywillfacilitaterapiddevelopmentofschedulingstrategies.• Ihopetheexpertswilldiscusstheseideasatthisconference.

Acknowledgements:UniversityofSouthernCalifornia/ISI’sTechnicalComputingNewDirectionsprogram.