increasing coherence between simulation and data analytics · § tony hey, stewart tansley, and...

Post on 22-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

IncreasingCoherenceBetweenSimulationandDataAnalyticsChesapeake Large Scale Data Analytics ConferenceAnnapolis, MDOctober 25, 2016 RobLeland

VicePresident,Science&TechnologyChiefTechnologyOfficerSandiaNationalLaboratories

SAND2016-10762 C

Outline

2

§ Ataleoftwovisions

§ Somebackground

§ AchargefromtheNationalStrategicComputingInitiative

§ Answerstothreekeyquestions§ Whyisanincreasingcoherencebetweensimulationandanalyticsimportant?§ Whatisreallymeantby“increasingcoherence”betweenthetwo?§ Howmightcoherencebefurtheredinpractice?

§ Aunifyingvision

Vision1:Fromascientificperspective

FromTheFourthParadigm:Data-IntensiveScientificDiscoverybyJimGray

Dataanalysiscomplementstheory,experiment,andcomputation

GraphmatchingexampleofdataanalyticsAkeyanalyticprimitive-- usedtofindaspecificinstanceofanabstractpatternofinterest

FromCoffman,Greenblatt,andMarcus,Graph-BasedTechnologiesforIntelligenceAnalysis, CommunicationsoftheACM,47,March2004.

Vision2:Fromanationalsecurityperspective

Somebackground

5

§ Simulation§ Computationstounderstandphysicalphenomenaorconductengineering

§ LargeScaleDataAnalytics(LSDA)§ DataAnalytics=Discoveringmeaningfulpatternsindata§ LargeScale=Requiringleading-edgeprocessingandstoragecapabilities

§ LSDAisincreasinginimportance§ Pervasive

§Commerce,finance,healthcare,science,engineering,nationalsecurity,...§ Lastingsocietalsignificance

§ Internetsearch,genomics,climatemodeling,Higgsparticle,...

§ LSDAisgetting“harder”§ Captureddatagrowingexponentiallywithtime§ Individualanalysisbecomingmoresophisticated§ Morepeopleexaminingmoredatamorefrequently§ AggregateworkgrowingmuchfasterthanMoore’sLaw

TheEconomist:

NationalStrategicComputingInitiative(NSCI)

6

NSCIStrategicObjectives

7

§ (1)Acceleratingdeliveryofacapableexascale computingsystemthatintegrateshardwareandsoftwarecapabilitytodeliverapproximately100timestheperformanceofcurrent10petaflopsystemsacrossarangeofapplicationsrepresentinggovernmentneeds.

§ (2)Increasingcoherencebetweenthetechnologybaseusedformodelingandsimulationandthatusedfordataanalyticcomputing.

§ (3)Establishing,overthenext15years,aviablepathforwardforfutureHPCsystemsevenafterthelimitsofcurrentsemiconductortechnologyarereached(the"post-Moore'sLawera").

§ (4)IncreasingthecapacityandcapabilityofanenduringnationalHPCecosystembyemployingaholisticapproachthataddressesrelevantfactorssuchasnetworkingtechnology,workflow,downwardscaling,foundationalalgorithmsandsoftware,accessibility,andworkforcedevelopment.

§ (5)Developinganenduringpublic-privatecollaborationtoensurethatthebenefitsoftheresearchanddevelopmentadvancesare,tothegreatestextent,sharedbetweentheUnitedStatesGovernmentandindustrialandacademicsectors.

Q1:Whyisincreasingcoherencebetweensimulationandanalyticsimportant?

8

§ Forsimulation§ HPCsimulationmustrideonsomecommoditycurve§ Largermarketforcesbehindanalytics§ Canexploitcommoditycomponenttechnologyfromanalytics

§ Foranalytics§ LargeScaleDataAnalyticsproblemsbecomingevermoresophisticated§ Requiringmorecoupledmethods§ CanexploitarchitecturallessonsfromHPCsimulation

§ Forboth:Integrationofsimulationandanalyticsinthesameworkflow§ Automationofanalysisofdatafromsimulation§ Creationofsyntheticdataviasimulationtoaugmentanalysis§ Automatedgenerationandtestingofhypothesis§ Explorationofnewscientificandtechnicalscenarios§ ...

Mutualinspiration,technicalsynergy,andeconomiesofscaleinthecreation,deployment,anduseofHPCresources

9

Achallengebecausesimulationandanalyticsdifferinmanyrespects…

DatastructuresdescribingsimulationandanalyticsdifferGraphsfromsimulationsmaybeirregular,buthavemorelocalitythanthosederivedfromanalytics

ComputationalSimulationofphysicalphenomena:

Climatemodeling Carcrash

Internetconnectivity Yeastproteininteractions

LargeScaleDataAnalytics:

FiguresfromLelandet.al.courtesyofYelick,LBNL.

TheU.S.roadmap,whichhasspatiallocalityandisthusmostsimilarofthethreeinstructuretocomputationalpatternsthatwouldariseintypicalphysicalsimulations.

Computationandcommunicationpatternsdiffer

Black =timespentcomputingGreen =timespentcommunicatingWhite =timespentwaitingfordatatobecommunicated

TheErdős-Rényi graph,awell-studiedexampleingraphtheorywork.

A scale-freegraph,anexamplemorereflectiveofreal-worldnetworks.

FigurefromLelandet.al.courtesyofJohnson,PNNL.

Simulation

Analytics

Standardbenchmarksinclude:• LINPACK(smallestdataintensiveness;barelyvisibleongraph)• STREAM• SPECFP• SpecInt

MemoryperformancedemandsdifferAkeydifferentiatorintheperformanceofsimulationandanalytics

FigurefromMurphy&Kogge withadjustmenttodoubleradiusofLinpack datapointtomakeitvisible.

Areaofthecircle=relativedataintensiveness(i.e.totalamountofuniquedataaccessed overafixedintervalofinstructions)

Simulation

Analytics

Applicationcodeproperty Simulation Analytics

Spatiallocality High Low

Temporallocality Moderate Low

Memoryfootprint Moderate High

Computationtype Maybefloating-pointdominated* Integerintensive

Input-outputorientation Outputdominated Inputdominated

*Increasingly,simulationworkhasbecomelessfloating-pointdominated

Applicationcodecharacteristicsdiffer

Contrastingproperties:

Q2:Sowhatdowereallymeanby“increasingcoherence”betweensimulationandanalytics?

14

§ NOTonesystemostensiblyoptimizedforbothsimulationandanalytics

§ Greatercommonalityinunderlyingcomponentryanddesignprinciples

§ Greaterinteroperability,allowinginterleavingofbothtypesofcomputations

…Amorecommonhardwareandsoftwareroadmapbetweensimulationandanalytics

15

Andyet,thereishope…

Simulationandanalyticsareevolvingtobecomemoresimilarintheirarchitecturalneeds

16

§ CurrentchallengesfortheLSDAcommunity§ Datamovement§ Powerconsumption§ Memory/interconnectbandwidth§ Scalingefficiency

§ InstructionmixforSandia’sHPCengineeringcodes§ Memoryoperations 40%§ Integeroperations 40%§ Floatingpoint 10%§ Other 10%

§ Commondesignimpactsofenergycosttrends§ Increasedconcurrency(processingthreads,cores,memorydepth)§ Increasedcomplexityandburdenon

§ systemsoftware,languages,tools,runtimesupport,codes

…similartoHPCsimulation

…similartoLSDA

Energycostofmovingdataisbecomingdominant

Energycost,inpicojou

les(pJ),pe

r64

-bitflo

ating-po

into

peratio

n

Costestimatesfortechnologyyear

Energycostforvariouscommonoperations

FromDanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.

ArchitecturalCharacteristic

Simulation Analytics

Computation Memoryaddressgenerationdominated Same

Primarymemory Lowpower,highbandwidth,semi-randomaccess Same

Secondarymemory Emergingtechnologiesmayoffsetcost,allowingmuchmorememory …require extremelylargememoryspaces

Storage Integrationofanotherlayerofmemoryhierarchytosupportcheckpoint/restart …tosupportout-of-coredatasetaccess

Interconnecttechnology Highbisectionbandwidth,(forrelativelycoarse-grainedaccess) …(forfine-grainedaccess)

Systemsoftware(node-level)

Lowdependenceonsystemservices,increasinglyadaptive,resourcemanagementforstructured parallelism

…highlyadaptive,resourcemanagementforunstructured parallelism

Systemsoftware(system-level) Increasinglyirregularworkflows Irregularworkflows

Emergingarchitecturalandsystemsoftwaresynergies

Similarneeds:

Q3:Howmightcoherencebefurtheredinpractice?

19

§ Makingitanelementofnationalstrategy§ CheckviatheNSCI

§ Buildingthisintoexascale computingefforts§ AlsoacomponentoftheNSCI

§ Communicatingwithandenlistingthetechnicalcommunitiesconcerned§ Thisforumandsimilarevents

§ Furtherdevelopingthevision§ Today’sdialoguesession!

Acknowledgements

20

Additionalreferences

21

§ TheEconomist,“Data,Data,Everywhere,” Feb25th,2010

§ R.C.MurphyandP.M.Kogge,“OntheMemoryAccessPatternsofSupercomputerApplications:BenchmarkSelectionandItsImplications,”IEEETransactionsonComputers56(7,July2007):937–945.

§ R.Murphy,“PowerIssues,”presentationtoJASON2012,June2012.

§ PeterKogge (editor)etal.,ExaScale ComputingStudy:TechnologyChallengesinAchievingExascaleSystems. DARPA,2008.

§ DanMcMorrow,TechnicalChallengesofExascaleComputing,JSR-12-310,JASON,MITRECorporation,April2013.

§ TonyHey,StewartTansley,andKristinTolle(editors), TheFourthParadigm:Data-IntensiveScientificDiscovery,MicrosoftResearch,2009.

§ JimGray,TheFourthParadigm:Data-IntensiveScientificDiscovery

top related