workflow adaptation on networked cloud infrastructures ... · pdf fileworkflow adaptation on...
TRANSCRIPT
Workflow Adaptation on Networked Cloud Infrastructures Using Intent and Performance Models
Anirban Mandal, Paul Ruth, Ilya BaldinRENCI, UNC – Chapel Hill
Rafael Ferreira da Silva, Gideon Juve, Ewa DeelmanUSC/ISI
Vickie Lynch (ORNL), Jeffrey Vetter (ORNL) Chris Carothers (RPI), Mariam Kiran (LBL), Brian Tierney (LBL)
Internet2TechnologyExchange2017,SanFrancisco,CA,Oct16,2017
2CloudProviders
VirtualComputeandStorageInfrastructure
NetworkTransitProviders
CloudAPIs(AmazonEC2..) NetworkProvisioningAPIs(DOEESNetOSCARS,Internet2DRAGON,OGFNSI…)
VirtualNetworkInfrastructure
NIaaS: Compute and Network Virtualization
Background: Networked Clouds
Mutually Isolated Virtual Networks
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
VM VM
Edge Providers(Compute Clouds and Network Providers)
Mutually Isolated Slicesof Virtual Resources
ScienceWorkflows
Observatory
Windtunnel
Background: ExoGENI NIaaS Testbed
Background: Scientific Workflows
5
Condor Head Node(handles initial workflow staging)
Add compute nodes for parallel compute intensive step
Workflow Dynamic Slice
Time
1.
Network intensive workflow staging
End workflow
Free unneeded compute nodes after compute step
Start workflow
3.
5.
Dynamically provision compute nodes and network for workflow
staging Dynamically destroy compute nodes and provisioned netowork
Dynamically create compute nodes
2.
4.
Background: Pegasus Workflow Management System
6Slidecourtesy:Pegasusteam@ISI(KaranVahi)
Challenges
7
• Lack of tools to integrate workflow management systems like Pegasus with dynamic infrastructure— Orchestrate infrastructure in response to workflow application— Manage workflow application in response to infrastructure
• Abstraction gap— Resource representations used in NIaaS systems suitable for
infrastructure management— Applications seldom require full-visibility, but require intermediate
abstractions
• Autonomous adaptation for complex workflows on dynamic infrastructure is relatively unexplored
• Projecting dynamic resource requirements as workflows execute is a hard problem
Key Contributions
8
• An approach to predicting dynamic resource needs for workflows— ‘application-aware’ controller called Pegasus ShadowQ
• An approach to automatically transforming high-level resource requirements to low-level provisioning operations— ‘application-independent’ controller called Mobius++
• A split-controller architecture and end-to-end system for automatic provisioning and adaptation of science workflows on dynamic infrastructures
• Evaluation of— accuracy of predictions— interplay of prediction and adaptation to balance performance and
resource utilization
Mapping Applications to NIaaS – End-to-end
System
ORCA/ExoGENI provisioning
Resource requestsData transfer requests
Data transfer QoS requirements
Pegasus Workflow Management System
AHAB Library(libndl +
libtransport)
Request Manager
AMQP space
Mobius++
Workflow Models (Aspen) + Simulation (CODES) Resource Provisioner
virtual SDX API (SDXTreeNetwork, …)
<
Workflow UltimateWMS ResourceProvisioning
provisionneededresources
AnomalyDetection
queryfortaskresourceneeds
PerformanceModelingPerformanceModeling
&Simulation
\
WorkflowExecution WorkflowMonitoring&InfrastructureMonitoring
Predictworkflowbehavioronresources
Panorama
“Application-aware” Controller – Pegasus ShadowQ
11
• Pegasus ShadowQ— Monitors running workflow— Makes prediction about future events in the workflow— Communicates resource requirements to Request Manager
• Reconstructs current state of running workflow from logs• Performs set of discrete event simulations to predict future
events • Periodically re-evaluates predictions by running additional
simulations• Estimates finish time of workflow based on current
resources and determines the resource level to complete the workflow within a deadline
• Communicates priorities of workflow data transfers
Application Requests to NIaaS
12
• Requests can be in the spectrum from general to specific— Abstract and simple to support majority of users— Customizable for advanced users
• Abstract requests — “A HTCondor pool”, “An MPI cluster”, “A Hadoop cluster”— “A HTCondor pool with on-ramp to external data-set (Cenic@ION)”— “A HTCondor pool using external data-set (Cenic@ION) needing
storage”— Prioritized data flow requests between end-points
• Complex requests customized from abstract requests— #workers, image, bandwidth, storage, postboot scripts, multi-site
• Adaptation requests— deadline, #resources to meet deadline, deadline difference etc.
13
Request Schema
• Initial resource request (mandatory elements)— “requestType” -> {new | modifyCompute | modifyNetwork}— “req_templateType” -> {condor | hadoop | mpi}[_sp][_storage][_multi]— “req_sliceID”, “req_wfuuid”
• Initial resource request (optional elements)— “req_numWorkers”, “req_storage”, “req_BW”— “req_linkID”, “req_stitchportID”, “req_linkBW”— “req_imageUrl”, “reqimageHash”, “req_imageName”, “req_postboot..”
• Adaptation request — compute – “requestType”, “req_sliceID”, “req_wfuuid”,
“req_numResReqToMeetDeadline”, “req_numCurrentRes”— network – “requestType”, “req_sliceID”, “req_wfuuid”,
“req_endpointSrc”, “req_endpointDst”, “req_flowPriority”,…— optional – “req_deadline”, “req_deadlineDiff”, “req_numResUtilMax”
14
“Application-independent” Controller – Mobius++
• Ahab Library— libndl, libtransport, extras
• Virtual SDX API• Request Manager
— Orchestrates Pegasus ShadowQ, Ahab, virtualSDXand stitchport mapper to transform high-level resource requests to NIaaS constructs
— manages resource adaptation across layersORCA/ExoGENI
provisioning
Resource requestsData transfer requests
Data transfer QoS requirements
AHAB Library(libndl +
libtransport)
Request Manager
Mobius++
virtual SDX API (SDXTreeNetwork, …)
App A
pp
App A
ppExoGENI slice + Applcation
App A
pp
App A
pp
App
• Programmatic abstraction for ExoGENI/ORCA Networked IaaS(NIaaS) system— Written in Java— Has several components
• LIBNDL— Create and modify NDL slice resource requests— Read NDL slice manifests
• LIBTransport— Submits requests to ExoGENI controllers
• Extras— Framework for building SDN topologies (e.g. Express Flow Network)
AHAB Library
AHAB – Simple ExampleISliceTransportAPIv1 sliceProxy = getSliceProxy(pemLocation, controllerUrl); SliceAccessContext<SSHAccessToken> sliceContext= getSliceContext(sshPubKey,sshUserName);
Slice s = Slice.create(sliceProxy, sliceContext, “MySlice”);
ComputeNode node0 = s.addComputeNode(“Node0”); node0.setImage(imageURL, imageHash, imageName); node0.setDomain(domainName); node0.setNodeType(“XO Medium”);
ComputeNode node1 = s.addComputeNode(“Node1”); node1.setImage(imageURL, imageHash, imageName); node1.setDomain(domainName); node1.setNodeType(“XO Medium”);
Network net0 = s.addBroadcastLink("net0"); InterfaceNode2Net ifaceNode0 = (InterfaceNode2Net) net0.stitch(node0); ifaceNode0.setIpAddress("172.16.0.100"); ifaceNode0.setMacAddress("255.255.255.0");
InterfaceNode2Net ifaceNode0 = (InterfaceNode2Net) net0.stitch(node1); ifaceNode1.setIpAddress("172.16.0.101"); ifaceNode1.setMacAddress("255.255.255.0");
s.commit();
ExoGENI Virtual SDX
• SoftwareDefinedExchanges(SDX)– meetingpointofnetworkstoexchangetraffic,securelyandwithQoS,usingSDNprotocols
• VirtualSDX– virtualoverlayactingasSDXwithoutpersistentphysicallocation
• ExoGENI virtualSDXcanmodifycompute,network,storagetosupportchangingdemandsofSDX
Virtual SDX
Site 1
OpenFlowController
Site 4Site 3
Site 2
ExoGENI Slice(s)
DynamicConnectivity
ExoGENI Virtual SDX: Panorama Workflows
PanoramamodelingandsimulationtoolsenablePegasustomonitorandmanipulatenetworkconnectivity&performance
Virtualized SDX
Site 1
OpenFlowController Site 3Site 2
Data Site
ExoGENI Slice with Virtual SDX + HTCondor Pools + Data Site
DynamicConnectivity
HTCondorPool
HTCondorPool
HTCondorPool
ExoGENI Virtual SDX: Express Flow Network
VirtualSDXtransparentlyarbitratesworkflowdataflows communicatedbyPegasus
SDX Tree Network with Express Flows
RYU OpenFlowController
OpenFlow v1.3
Site0
Site1
Express Flows (Bandwidth Provisioned)
Site5
Site6
Site4
Site3
Site2
20
Compute Adaptation with ShadowQ and Mobius
Pegasus WMS + ShadowQ
Resource Prediction
1. ShadowQ makes predictions about resource requirements of workflow2. ShadowQ sends modify requests to messaging space M
odify
re
ques
ts
RequestManager (RM), ndlLib, Adaptation policies
Mobius
3. RM runs adaptation policies and uses ndlLib to generate modify requests ExoGENI Slice
4. RM sends modify requests to NIaaS
5. Slice modified (nodes added or deleted) per workflow needs
NIaaS/Infrastructure
ORCA/ExoGENI
Slic
eM
anife
st In
fo
7. RM sends elements of new slice manifest to ShadowQ
6. ExoGENI controller queried for new slice state
Evaluation: Application and Infrastructure
21
• Data-driven workflow application called Montage— Astronomy image application— Developed at NASA’s Infrared
Processing and Analysis Center at CalTech
— Creates an output mosaic of portion of sky from input images
— Used Pegasus to run Montage
• Infrastructure— 2 racks from ExoGENI testbed
(FIU and WVN)— Slices provisioned to instantiate
HTCondor master and workers (4-16)
— Connected with 500Mb/s bandwidth
Evaluation: Accuracy of ShadowQ Predictions
22
0 500 1000 1500 2000 2500
2500
2520
2540
2560
2580
2600
Time of Simulation (seconds since start of workflow)
Fini
sh T
ime
(sec
onds
sin
ce s
tart
of w
orkf
low
)
Actual Finish TimeEstimated Finish Time
MajorityofestimatedworkflowfinishtimespredictedbytheShadowQ werewithin60secondsoftheactualfinishtime
Mandal,P.Ruth,I.Baldin,Y.Xin,C.Castillo,G.Juve,M.Rynge,E.Deelman,andJ.Chase,“AdaptingScientificWorkflowsonNetworkedCloudsUsingProactiveIntrospection,”inProceedings oftheIEEE/ACM8thInternationalConferenceonUtilityandCloudComputing(UCC2015)
Evaluation: Accuracy of ShadowQ Predictions
23
JobstarttimespredictedbytheShadowQ 10minutespriortotheactualjobstarttimewerewithin60secondsoftheactualstartforthe
majorityofjobs
● ● ● ●
●
● ● ●
●
● ● ● ●
●
●
●
●
●
●
●
−600 −500 −400 −300 −200 −100 0
020
4060
8010
012
0
Time Before Actual Job Start (seconds)
Abso
lute
Erro
r (se
cond
s)●
stage_incomputestage_outcleanupcreatedir
Evaluation: Scaling (No ShadowQ/Adaptation)
24
0 2 4 6 8 10 12 14 161500
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
Number of VMs or slots
Wo
rkflo
w M
ake
spa
n (
seco
nd
s)
Makespan vs. number of VMs (No shadowQ predictions)
minimum deadline for each VM count
• Configuration used for scaling experiment— 4-degree Montage workflow — 1 through 16 HTCondor worker
VMs (== HTCondor slots)— Pegasus workflow clustering
factor of 8
• Results show— Makespan decreases until up to 8
slots— Expected because of clustering
factor— The one-VM case corresponds to
value that determines minimum “CPU seconds” needed to complete workflow, which relates to 100% utilization
Mandal,P.Ruth,I.Baldin,Y.Xin,C.Castillo,G.Juve,M.Rynge,E.Deelman,andJ.Chase,“AdaptingScientificWorkflowsonNetworkedCloudsUsingProactiveIntrospection,”inProceedings oftheIEEE/ACM8thInternationalConferenceonUtilityandCloudComputing(UCC2015)
Using ShadowQ/Mobius++ for Adaptation
25
• Plot of resources used for different deadline relaxation factors• Results show
— ShadowQ can predict dynamic resource requirements— Mobius satisfies requirements by dynamically (de-)provisioning — Resource usage pattern depends on value of deadline relaxation factor
0 500 1000 1500 2000 2500 3000 35000
1
2
3
4
5
6
7
8
9
Workflow Execution Time (seconds since start of workflow)
Reso
urc
es
Use
d
Resource usage during workflow execution [starting with 8 slots]
deadline = 1.00 * min_deadline(8)deadline = 1.25 * min_deadline(8)deadline = 1.50 * min_deadline(8)deadline = 2.00 * min_deadline(8)
0 500 1000 1500 2000 2500 3000 35000
1
2
3
4
5
6
Workflow Execution Time (seconds since start of workflow)R
eso
urc
es
Use
d
Resource usage during workflow execution [starting with 1 slot]
deadline = 0.33 * min_deadline(1)deadline = 0.50 * min_deadline(1)deadline = 0.75 * min_deadline(1)
Using ShadowQ/Mobius for Adaptation
26
• Plot of resources usage factor for different deadline relaxation factors• Results show
— Higher deadline relaxation factors result in lower resource usage factor (i.e. higher resource utilization) only up to an inflection point
— At relaxation factor of 1.5, we sacrifice 50% execution time, but can be within 20% of best resource utilization and 50% saving in resource usage
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 21
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
Deadline Relaxation Factor
Re
sou
rce
Usa
ge
Fa
cto
r
Relative resource usage for different deadlines [starting with 8 slots]
CPU seconds (used) / CPU seconds (min)
ResourceUsageFactor=CPUseconds(used)/CPUseconds(min)
HigherResourceUsageFactorimpliesLowerResourceUtilization
Workflow Data Flow Arbitration Use Case: 1000Genome
https://github.com/pegasus-isi/1000genome-workflow
This workflow identifies mutational overlaps using data from the 1000 genomesproject in order to provide a null distribution for rigorous statistical evaluation of potential disease-related mutations
A full run of the workflow consumes/produces over
4.4TB of data, and requires over 24TB of memory
For the experiments, we limited the workflow for the execution of 2 populations: • 22 Individual jobs• 2 Sifting jobs• 14 Frequency jobs• 14 Mutations_overlap jobs
“TowardPrioritizationofDataFlowsforScientificWorkflowsUsingVirtualSoftwareDefinedExchanges,”Anirban Mandal,PaulRuth,IlyaBaldin,RafaelFerreiraDaSilvaandEwa Deelman,WoWS workshopateScience 2017.
Virtualized SDX
OpenFlowController
Data Node
HTCondorPool 1
HTCondorPool 2
(StarLight ExoGENI Rack)
(UMass ExoGENI Rack)
(PSC ExoGENI Rack)
Virtual SDX + HTCondor Pools for Workflows
Network Adaptation with Pegasus and Mobius++
Resource Requests
Messaging space for high-level provisioning requests
0. Send high-level provisioning requests to messaging space sdxcondor | condor
#workers
T
Scientist Tools/PortalNIaaS/Infrastructure
ORCA/ExoGENIRequest Manager (RM), AHAB, VirtualSDX API
Mobius ++
1. Create SDN controller node
2. Create core SDX Tree Network
3. Create HTCondor pools and data site
Network Adaptation with Pegasus and Mobius++
Pegasus WMS
1. Pegasus determines “Express flow” requirements for workflow data transfers
2. Pegasus sends “modifyNetwork” QoS requests to messaging space
Mod
ify
requ
ests
Request Manager (RM), AHAB, VirtualSDX API
Mobius ++
3. RM uses AHAB and VirtualSDX API to send SDN QoS requests to SDX Tree Network
4. SDX Tree Network actuates QoS actions using REST API of SDN controller
6. Slice modified with required bandwidth QoS
NIaaS/Infrastructure
VirtualSDX + ExoGENI
Slic
e re
ady
with
Q
oS 7. RM sends ack. to Pegasus when Express flow QoS is set
5. SDN controller contacts SDX switches for each site
Workflow Models (Aspen) +
Simulation (CODES)Resource
Provisioner
Evaluation: Data Flow Prioritization
31
Effect of relative flow priorities on observed data transfer times for data-intensive workflow tasks for different vSDX provisioned bandwidths
10-80 20-70 30-60 40-50 45-45 50-40 60-30 70-20 80-10
Relative data flow priorities (Workflow1-Workflow2)
0
200
400
600
800
1000
1200
1400
1600
1800
Avg
. tr
ansf
er
time for
"indiv
idual"
jobs
(secs
)
Transfer time vs. Flow priorities (Different bandwidths)
Workflow1 on HTCondor pool 1 (vSDX bandwidth = 250Mb/s)Workflow2 on HTCondor pool 2 (vSDX bandwidth = 250Mb/s)Workflow1 on HTCondor pool 1 (vSDX bandwidth = 500Mb/s)Workflow2 on HTCondor pool 2 (vSDX bandwidth = 500Mb/s)
10-80 20-70 30-60 40-50 45-45 50-40 60-30 70-20 80-10
Relative data flow priorities (Workflow1-Workflow2)
2000
2500
3000
3500
4000
4500
5000
5500
6000
6500
Wo
rkflo
w e
xecu
tion
tim
e (
secs
)
Execution time vs. Flow priorities (vSDX bandwidth = 250Mb/s)
Workflow1 on HTCondor pool 1Workflow2 on HTCondor pool 2
Effect of relative flow priorities on overall workflow execution times
“TowardPrioritizationofDataFlowsforScientificWorkflowsUsingVirtualSoftwareDefinedExchanges,”Anirban Mandal,PaulRuth,IlyaBaldin,RafaelFerreiraDaSilvaandEwa Deelman,WoWS workshopwitheScience 2017.
Conclusions
32
• Presented an end-to-end system with application-aware and application-independent controllers— Pegasus ShadowQ controller that predicts dynamic workflow
requirements — Mobius++ controller that facilitates the use of NIaaS systems by
workflow management systems like Pegasus
• Showed how ShadowQ predictions can be used by Mobius++ to actuate infrastructure adaptation
• Showed that interplay of prediction and adaptation can balance performance and resource utilization of representative workflows
Thank you
DOEAward#:DoEASCRPanorama(DE-SC0012390)NSFAward#:NSFACI1245926(CC-NIE:ADAMANT)DOEAward#:DoESciDAC SUPER(DE-FG02-11ER26050/DE-SC0006925)GENIProjectOfficeAward#:1872