workflow adaptation on networked cloud infrastructures ... · pdf fileworkflow adaptation on...

Workflow Adaptation on Networked Cloud Infrastructures Using Intent and Performance Models

Anirban Mandal, Paul Ruth, Ilya BaldinRENCI, UNC – Chapel Hill

Rafael Ferreira da Silva, Gideon Juve, Ewa DeelmanUSC/ISI

Vickie Lynch (ORNL), Jeffrey Vetter (ORNL) Chris Carothers (RPI), Mariam Kiran (LBL), Brian Tierney (LBL)

Internet2TechnologyExchange2017,SanFrancisco,CA,Oct16,2017

2CloudProviders

VirtualComputeandStorageInfrastructure

NetworkTransitProviders

CloudAPIs(AmazonEC2..) NetworkProvisioningAPIs(DOEESNetOSCARS,Internet2DRAGON,OGFNSI…)

VirtualNetworkInfrastructure

NIaaS: Compute and Network Virtualization

Background: Networked Clouds

Mutually Isolated Virtual Networks

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

VM VM

Edge Providers(Compute Clouds and Network Providers)

Mutually Isolated Slicesof Virtual Resources

ScienceWorkflows

Observatory

Windtunnel

Background: ExoGENI NIaaS Testbed

Background: Scientific Workflows

5

Condor Head Node(handles initial workflow staging)

Add compute nodes for parallel compute intensive step

Workflow Dynamic Slice

Time

1.

Network intensive workflow staging

End workflow

Free unneeded compute nodes after compute step

Start workflow

3.

5.

Dynamically provision compute nodes and network for workflow

staging Dynamically destroy compute nodes and provisioned netowork

Dynamically create compute nodes

2.

4.

Background: Pegasus Workflow Management System

6Slidecourtesy:Pegasusteam@ISI(KaranVahi)

Challenges

7

• Lack of tools to integrate workflow management systems like Pegasus with dynamic infrastructure— Orchestrate infrastructure in response to workflow application— Manage workflow application in response to infrastructure

• Abstraction gap— Resource representations used in NIaaS systems suitable for

infrastructure management— Applications seldom require full-visibility, but require intermediate

abstractions

• Autonomous adaptation for complex workflows on dynamic infrastructure is relatively unexplored

• Projecting dynamic resource requirements as workflows execute is a hard problem

Key Contributions

8

• An approach to predicting dynamic resource needs for workflows— ‘application-aware’ controller called Pegasus ShadowQ

• An approach to automatically transforming high-level resource requirements to low-level provisioning operations— ‘application-independent’ controller called Mobius++

• A split-controller architecture and end-to-end system for automatic provisioning and adaptation of science workflows on dynamic infrastructures

• Evaluation of— accuracy of predictions— interplay of prediction and adaptation to balance performance and

resource utilization

Mapping Applications to NIaaS – End-to-end

System

ORCA/ExoGENI provisioning

Resource requestsData transfer requests

Data transfer QoS requirements

Pegasus Workflow Management System

AHAB Library(libndl +

libtransport)

Request Manager

AMQP space

Mobius++

Workflow Models (Aspen) + Simulation (CODES) Resource Provisioner

virtual SDX API (SDXTreeNetwork, …)

<

Workflow UltimateWMS ResourceProvisioning

provisionneededresources

AnomalyDetection

queryfortaskresourceneeds

PerformanceModelingPerformanceModeling

&Simulation

\

WorkflowExecution WorkflowMonitoring&InfrastructureMonitoring

Predictworkflowbehavioronresources

Panorama

“Application-aware” Controller – Pegasus ShadowQ

11

• Pegasus ShadowQ— Monitors running workflow— Makes prediction about future events in the workflow— Communicates resource requirements to Request Manager

• Reconstructs current state of running workflow from logs• Performs set of discrete event simulations to predict future

events • Periodically re-evaluates predictions by running additional

simulations• Estimates finish time of workflow based on current

resources and determines the resource level to complete the workflow within a deadline

• Communicates priorities of workflow data transfers

Application Requests to NIaaS

12

• Requests can be in the spectrum from general to specific— Abstract and simple to support majority of users— Customizable for advanced users

• Abstract requests — “A HTCondor pool”, “An MPI cluster”, “A Hadoop cluster”— “A HTCondor pool with on-ramp to external data-set (Cenic@ION)”— “A HTCondor pool using external data-set (Cenic@ION) needing

storage”— Prioritized data flow requests between end-points

• Complex requests customized from abstract requests— #workers, image, bandwidth, storage, postboot scripts, multi-site

• Adaptation requests— deadline, #resources to meet deadline, deadline difference etc.

13

Request Schema

• Initial resource request (mandatory elements)— “requestType” -> {new | modifyCompute | modifyNetwork}— “req_templateType” -> {condor | hadoop | mpi}[_sp][_storage][_multi]— “req_sliceID”, “req_wfuuid”

• Initial resource request (optional elements)— “req_numWorkers”, “req_storage”, “req_BW”— “req_linkID”, “req_stitchportID”, “req_linkBW”— “req_imageUrl”, “reqimageHash”, “req_imageName”, “req_postboot..”

• Adaptation request — compute – “requestType”, “req_sliceID”, “req_wfuuid”,

“req_numResReqToMeetDeadline”, “req_numCurrentRes”— network – “requestType”, “req_sliceID”, “req_wfuuid”,

“req_endpointSrc”, “req_endpointDst”, “req_flowPriority”,…— optional – “req_deadline”, “req_deadlineDiff”, “req_numResUtilMax”

14

“Application-independent” Controller – Mobius++

• Ahab Library— libndl, libtransport, extras

• Virtual SDX API• Request Manager

— Orchestrates Pegasus ShadowQ, Ahab, virtualSDXand stitchport mapper to transform high-level resource requests to NIaaS constructs

— manages resource adaptation across layersORCA/ExoGENI

provisioning

Resource requestsData transfer requests

Data transfer QoS requirements

AHAB Library(libndl +

libtransport)

Request Manager

Mobius++

virtual SDX API (SDXTreeNetwork, …)

App A

pp

App A

ppExoGENI slice + Applcation

App A

pp

App A

pp

App

• Programmatic abstraction for ExoGENI/ORCA Networked IaaS(NIaaS) system— Written in Java— Has several components

• LIBNDL— Create and modify NDL slice resource requests— Read NDL slice manifests

• LIBTransport— Submits requests to ExoGENI controllers

• Extras— Framework for building SDN topologies (e.g. Express Flow Network)

AHAB Library

AHAB – Simple ExampleISliceTransportAPIv1 sliceProxy = getSliceProxy(pemLocation, controllerUrl); SliceAccessContext<SSHAccessToken> sliceContext= getSliceContext(sshPubKey,sshUserName);

Slice s = Slice.create(sliceProxy, sliceContext, “MySlice”);

ComputeNode node0 = s.addComputeNode(“Node0”); node0.setImage(imageURL, imageHash, imageName); node0.setDomain(domainName); node0.setNodeType(“XO Medium”);

ComputeNode node1 = s.addComputeNode(“Node1”); node1.setImage(imageURL, imageHash, imageName); node1.setDomain(domainName); node1.setNodeType(“XO Medium”);

Network net0 = s.addBroadcastLink("net0"); InterfaceNode2Net ifaceNode0 = (InterfaceNode2Net) net0.stitch(node0); ifaceNode0.setIpAddress("172.16.0.100"); ifaceNode0.setMacAddress("255.255.255.0");

InterfaceNode2Net ifaceNode0 = (InterfaceNode2Net) net0.stitch(node1); ifaceNode1.setIpAddress("172.16.0.101"); ifaceNode1.setMacAddress("255.255.255.0");

s.commit();

ExoGENI Virtual SDX

• SoftwareDefinedExchanges(SDX)– meetingpointofnetworkstoexchangetraffic,securelyandwithQoS,usingSDNprotocols

• VirtualSDX– virtualoverlayactingasSDXwithoutpersistentphysicallocation

• ExoGENI virtualSDXcanmodifycompute,network,storagetosupportchangingdemandsofSDX

Virtual SDX

Site 1

OpenFlowController

Site 4Site 3

Site 2

ExoGENI Slice(s)

DynamicConnectivity

ExoGENI Virtual SDX: Panorama Workflows

PanoramamodelingandsimulationtoolsenablePegasustomonitorandmanipulatenetworkconnectivity&performance

Virtualized SDX

Site 1

OpenFlowController Site 3Site 2

Data Site

ExoGENI Slice with Virtual SDX + HTCondor Pools + Data Site

DynamicConnectivity

HTCondorPool

HTCondorPool

HTCondorPool

ExoGENI Virtual SDX: Express Flow Network

VirtualSDXtransparentlyarbitratesworkflowdataflows communicatedbyPegasus

SDX Tree Network with Express Flows

RYU OpenFlowController

OpenFlow v1.3

Site0

Site1

Express Flows (Bandwidth Provisioned)

Site5

Site6

Site4

Site3

Site2

20

Compute Adaptation with ShadowQ and Mobius

Pegasus WMS + ShadowQ

Resource Prediction

1. ShadowQ makes predictions about resource requirements of workflow2. ShadowQ sends modify requests to messaging space M

odify

re

ques

ts

RequestManager (RM), ndlLib, Adaptation policies

Mobius

3. RM runs adaptation policies and uses ndlLib to generate modify requests ExoGENI Slice

4. RM sends modify requests to NIaaS

5. Slice modified (nodes added or deleted) per workflow needs

NIaaS/Infrastructure

ORCA/ExoGENI

Slic

eM

anife

st In

fo

7. RM sends elements of new slice manifest to ShadowQ

6. ExoGENI controller queried for new slice state

Evaluation: Application and Infrastructure

21

• Data-driven workflow application called Montage— Astronomy image application— Developed at NASA’s Infrared

Processing and Analysis Center at CalTech

— Creates an output mosaic of portion of sky from input images

— Used Pegasus to run Montage

• Infrastructure— 2 racks from ExoGENI testbed

(FIU and WVN)— Slices provisioned to instantiate

HTCondor master and workers (4-16)

— Connected with 500Mb/s bandwidth

Evaluation: Accuracy of ShadowQ Predictions

22

0 500 1000 1500 2000 2500

2500

2520

2540

2560

2580

2600

Time of Simulation (seconds since start of workflow)

Fini

sh T

ime

(sec

onds

sin

ce s

tart

of w

orkf

low

)

Actual Finish TimeEstimated Finish Time

MajorityofestimatedworkflowfinishtimespredictedbytheShadowQ werewithin60secondsoftheactualfinishtime

Mandal,P.Ruth,I.Baldin,Y.Xin,C.Castillo,G.Juve,M.Rynge,E.Deelman,andJ.Chase,“AdaptingScientificWorkflowsonNetworkedCloudsUsingProactiveIntrospection,”inProceedings oftheIEEE/ACM8thInternationalConferenceonUtilityandCloudComputing(UCC2015)

Evaluation: Accuracy of ShadowQ Predictions

23

JobstarttimespredictedbytheShadowQ 10minutespriortotheactualjobstarttimewerewithin60secondsoftheactualstartforthe

majorityofjobs

● ● ● ●

●

● ● ●

●

● ● ● ●

●

●

●

●

●

●

●

−600 −500 −400 −300 −200 −100 0

020

4060

8010

012

0

Time Before Actual Job Start (seconds)

Abso

lute

Erro

r (se

cond

s)●

stage_incomputestage_outcleanupcreatedir

Evaluation: Scaling (No ShadowQ/Adaptation)

24

0 2 4 6 8 10 12 14 161500

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

Number of VMs or slots

Wo

rkflo

w M

ake

spa

n (

seco

nd

s)

Makespan vs. number of VMs (No shadowQ predictions)

minimum deadline for each VM count

• Configuration used for scaling experiment— 4-degree Montage workflow — 1 through 16 HTCondor worker

VMs (== HTCondor slots)— Pegasus workflow clustering

factor of 8

• Results show— Makespan decreases until up to 8

slots— Expected because of clustering

factor— The one-VM case corresponds to

value that determines minimum “CPU seconds” needed to complete workflow, which relates to 100% utilization

Mandal,P.Ruth,I.Baldin,Y.Xin,C.Castillo,G.Juve,M.Rynge,E.Deelman,andJ.Chase,“AdaptingScientificWorkflowsonNetworkedCloudsUsingProactiveIntrospection,”inProceedings oftheIEEE/ACM8thInternationalConferenceonUtilityandCloudComputing(UCC2015)

Using ShadowQ/Mobius++ for Adaptation

25

• Plot of resources used for different deadline relaxation factors• Results show

— ShadowQ can predict dynamic resource requirements— Mobius satisfies requirements by dynamically (de-)provisioning — Resource usage pattern depends on value of deadline relaxation factor

0 500 1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

7

8

9

Workflow Execution Time (seconds since start of workflow)

Reso

urc

es

Use

d

Resource usage during workflow execution [starting with 8 slots]

deadline = 1.00 * min_deadline(8)deadline = 1.25 * min_deadline(8)deadline = 1.50 * min_deadline(8)deadline = 2.00 * min_deadline(8)

0 500 1000 1500 2000 2500 3000 35000

1

2

3

4

5

6

Workflow Execution Time (seconds since start of workflow)R

eso

urc

es

Use

d

Resource usage during workflow execution [starting with 1 slot]

deadline = 0.33 * min_deadline(1)deadline = 0.50 * min_deadline(1)deadline = 0.75 * min_deadline(1)

Using ShadowQ/Mobius for Adaptation

26

• Plot of resources usage factor for different deadline relaxation factors• Results show

— Higher deadline relaxation factors result in lower resource usage factor (i.e. higher resource utilization) only up to an inflection point

— At relaxation factor of 1.5, we sacrifice 50% execution time, but can be within 20% of best resource utilization and 50% saving in resource usage

1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

Deadline Relaxation Factor

Re

sou

rce

Usa

ge

Fa

cto

r

Relative resource usage for different deadlines [starting with 8 slots]

CPU seconds (used) / CPU seconds (min)

ResourceUsageFactor=CPUseconds(used)/CPUseconds(min)

HigherResourceUsageFactorimpliesLowerResourceUtilization

Workflow Data Flow Arbitration Use Case: 1000Genome

https://github.com/pegasus-isi/1000genome-workflow

This workflow identifies mutational overlaps using data from the 1000 genomesproject in order to provide a null distribution for rigorous statistical evaluation of potential disease-related mutations

A full run of the workflow consumes/produces over

4.4TB of data, and requires over 24TB of memory

For the experiments, we limited the workflow for the execution of 2 populations: • 22 Individual jobs• 2 Sifting jobs• 14 Frequency jobs• 14 Mutations_overlap jobs

“TowardPrioritizationofDataFlowsforScientificWorkflowsUsingVirtualSoftwareDefinedExchanges,”Anirban Mandal,PaulRuth,IlyaBaldin,RafaelFerreiraDaSilvaandEwa Deelman,WoWS workshopateScience 2017.

Virtualized SDX

OpenFlowController

Data Node

HTCondorPool 1

HTCondorPool 2

(StarLight ExoGENI Rack)

(UMass ExoGENI Rack)

(PSC ExoGENI Rack)

Virtual SDX + HTCondor Pools for Workflows

Network Adaptation with Pegasus and Mobius++

Resource Requests

Messaging space for high-level provisioning requests

0. Send high-level provisioning requests to messaging space sdxcondor | condor

#workers

T

Scientist Tools/PortalNIaaS/Infrastructure

ORCA/ExoGENIRequest Manager (RM), AHAB, VirtualSDX API

Mobius ++

1. Create SDN controller node

2. Create core SDX Tree Network

3. Create HTCondor pools and data site

Network Adaptation with Pegasus and Mobius++

Pegasus WMS

1. Pegasus determines “Express flow” requirements for workflow data transfers

2. Pegasus sends “modifyNetwork” QoS requests to messaging space

Mod

ify

requ

ests

Request Manager (RM), AHAB, VirtualSDX API

Mobius ++

3. RM uses AHAB and VirtualSDX API to send SDN QoS requests to SDX Tree Network

4. SDX Tree Network actuates QoS actions using REST API of SDN controller

6. Slice modified with required bandwidth QoS

NIaaS/Infrastructure

VirtualSDX + ExoGENI

Slic

e re

ady

with

Q

oS 7. RM sends ack. to Pegasus when Express flow QoS is set

5. SDN controller contacts SDX switches for each site

Workflow Models (Aspen) +

Simulation (CODES)Resource

Provisioner

Evaluation: Data Flow Prioritization

31

Effect of relative flow priorities on observed data transfer times for data-intensive workflow tasks for different vSDX provisioned bandwidths

10-80 20-70 30-60 40-50 45-45 50-40 60-30 70-20 80-10

Relative data flow priorities (Workflow1-Workflow2)

0

200

400

600

800

1000

1200

1400

1600

1800

Avg

. tr

ansf

er

time for

"indiv

idual"

jobs

(secs

)

Transfer time vs. Flow priorities (Different bandwidths)

Workflow1 on HTCondor pool 1 (vSDX bandwidth = 250Mb/s)Workflow2 on HTCondor pool 2 (vSDX bandwidth = 250Mb/s)Workflow1 on HTCondor pool 1 (vSDX bandwidth = 500Mb/s)Workflow2 on HTCondor pool 2 (vSDX bandwidth = 500Mb/s)

10-80 20-70 30-60 40-50 45-45 50-40 60-30 70-20 80-10

Relative data flow priorities (Workflow1-Workflow2)

2000

2500

3000

3500

4000

4500

5000

5500

6000

6500

Wo

rkflo

w e

xecu

tion

tim

e (

secs

)

Execution time vs. Flow priorities (vSDX bandwidth = 250Mb/s)

Workflow1 on HTCondor pool 1Workflow2 on HTCondor pool 2

Effect of relative flow priorities on overall workflow execution times

“TowardPrioritizationofDataFlowsforScientificWorkflowsUsingVirtualSoftwareDefinedExchanges,”Anirban Mandal,PaulRuth,IlyaBaldin,RafaelFerreiraDaSilvaandEwa Deelman,WoWS workshopwitheScience 2017.

Conclusions

32

• Presented an end-to-end system with application-aware and application-independent controllers— Pegasus ShadowQ controller that predicts dynamic workflow

requirements — Mobius++ controller that facilitates the use of NIaaS systems by

workflow management systems like Pegasus

• Showed how ShadowQ predictions can be used by Mobius++ to actuate infrastructure adaptation

• Showed that interplay of prediction and adaptation can balance performance and resource utilization of representative workflows

Thank you

DOEAward#:DoEASCRPanorama(DE-SC0012390)NSFAward#:NSFACI1245926(CC-NIE:ADAMANT)DOEAward#:DoESciDAC SUPER(DE-FG02-11ER26050/DE-SC0006925)GENIProjectOfficeAward#:1872

workflow adaptation on networked cloud infrastructures ... · pdf fileworkflow adaptation on...

Documents