xco: coordination to prevent network congestion in cloud ...dcm/teaching/cda5532-cloud... · xco...
TRANSCRIPT
![Page 1: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/1.jpg)
XCo: Explicit Coordination to Prevent Network Fabric Congestion in Cloud Computing Cluster
Platforms
Presented by Wei Dai
![Page 2: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/2.jpg)
Reasons for Congestion in CloudReasons for Congestion in Cloud
• Cloud operators use virtualization to consolidateCloud operators use virtualization to consolidate thousands of VMs on shared hardware platforms due to cost concerns.
• Most VMs host service‐oriented applications that are ppinherently communication intensive.
2
![Page 3: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/3.jpg)
Reasons for Congestion in CloudReasons for Congestion in Cloud
• Cloud computing infrastructures consist of large dataCloud computing infrastructures consist of large data center clusters using commodity servers and networking hardware.
Pros:Cheap
Easy to install and manage
Can be shared by a wide range of network services and protocolsprotocols
3
![Page 4: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/4.jpg)
Reasons for Congestion in CloudReasons for Congestion in Cloud
• Cloud computing infrastructures consist of large data p g gcenter clusters using commodity servers and networking hardware.Cons:Higher latencySmaller / lower‐performance packet buffersSmaller / lower‐performance packet buffers
• Switch buffers can easily become overwhelmed by y yhigh‐throughput traffic that can be bursty and synchronized, leading to significant packet losses.
4
![Page 5: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/5.jpg)
Types of CongestionTypes of Congestion
• TCP throughput collapse (also known as Incast)TCP throughput collapse (also known as Incast)
Well‐known example of congestion experienced by barrier‐synchronized traffic, e.g. synchronous reads in networked storage
• Congestion caused by non‐TCP traffic, e.g. UDP
• Congestion caused by traffic not TCP‐friendly
voice/video over IP, and peer‐to‐peer traffic
• Congestion caused by large number of short TCP sessions
5
![Page 6: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/6.jpg)
How to Solve the ProblemHow to Solve the Problem
• Root cause: transient overload of buffers withinRoot cause: transient overload of buffers within switches
• Hardware and software mechanisms are hard to deploy at scale.
• Ethernet flow control in IEEE 802.3x helps in low‐end edge switches, but is counter‐productive in backbone switches.
6
![Page 7: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/7.jpg)
How to Solve the ProblemHow to Solve the Problem
• Current industry practice:Current industry practice:Add higher capacity network switches
Multi‐port network cards
Physically separate networks for data and control traffic
• Drawback: increase cost and complexity without addressing the root cause
7
![Page 8: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/8.jpg)
XCo – Explicit CoordinationXCo Explicit Coordination
• Coordinate network transmissions from multipleCoordinate network transmissions from multiple VMs to avoid throughput collapse and increase network utilization
• Advantages: simple, effective, feasible, and g pindependent of switch‐level hardware support, transparent implementation without modifying any
l d d l k happlications, standard protocols, network switches or VMs
8
![Page 9: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/9.jpg)
XCo – Explicit CoordinationXCo Explicit Coordination
9
![Page 10: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/10.jpg)
Central ControllerCentral Controller
• Resides in the same switched network as other nodes
• Takes as input:Switch interconnection topology and link capacitiesSwitch interconnection topology and link capacitiesLocation of VMs on physical nodesCurrent traffic matrix of the networkAdministrative policiesAdministrative policies
• Whenever detects congestion buildup at any link, computes and sends transmission directives to local coordinators at each end‐host that is contributing to the congestion
10
![Page 11: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/11.jpg)
Local CoordinatorLocal Coordinator
• Intercepts and regulates the outgoing trafficIntercepts and regulates the outgoing traffic aggregates (VM‐to‐VM flows) from all VMs within the corresponding end‐host according to transmission directives
• Provides traffic feedback to the central controller
• The specific regulation pattern is dictated by transmission directives.
11
![Page 12: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/12.jpg)
Transmission DirectivesTransmission Directives
• Explicit instructions for transmissionExplicit instructions for transmission
• Various forms:Various forms:
Explicit timeslice scheduling
which V2V flow transmits when and for how longwhich V2V flow transmits when and for how long
Explicit rate limiting
t h t t V2V fl h ld t it f th t Nat what rate a V2V flow should transmit for the next N ms
Combination of the above two or other forms
12
![Page 13: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/13.jpg)
Explicit Timeslice SchedulingExplicit Timeslice Scheduling
13
![Page 14: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/14.jpg)
Work ConservationWork Conservation
• Some nodes may finish early with their timeslice.y y
• Local coordinators return the remaining part of timesliceb k l llback to central controller.
• Central controller then permits another node to transmit• Central controller then permits another node to transmit.
• Local coordinators introduce a small hysteresis delay y ybefore returning the timeslice, in case that more packets might arrive during the delay.
14
![Page 15: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/15.jpg)
Experimental SetupsExperimental Setups
15
![Page 16: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/16.jpg)
Impact of Ethernet CongestionImpact of Ethernet Congestion
16
![Page 17: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/17.jpg)
Performance Evaluation of XCoPerformance Evaluation of XCo
17
![Page 18: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/18.jpg)
Impact of Ethernet CongestionImpact of Ethernet Congestion
18
![Page 19: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/19.jpg)
Performance Evaluation of XCoPerformance Evaluation of XCo
19
![Page 20: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/20.jpg)
Impact of Ethernet CongestionImpact of Ethernet Congestion
20
![Page 21: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/21.jpg)
Performance Evaluation of XCoPerformance Evaluation of XCo
21
![Page 22: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/22.jpg)
Experimental SetupExperimental Setup
22
![Page 23: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/23.jpg)
Impact of Ethernet CongestionImpact of Ethernet Congestion
23
![Page 24: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/24.jpg)
Performance Evaluation of XCoPerformance Evaluation of XCo
24
![Page 25: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/25.jpg)
Live VMMigrationLive VM Migration
25
![Page 26: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/26.jpg)
Fairness among V2V FlowsFairness among V2V Flows
26
![Page 27: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/27.jpg)
Work ConservationWork Conservation
27
![Page 28: XCo: Coordination to Prevent Network Congestion in Cloud ...dcm/Teaching/CDA5532-Cloud... · XCo –Explicit Coordination • Coordinate network transmissions from multiple VMs to](https://reader035.vdocument.in/reader035/viewer/2022062610/6106e90b0f28ab0aae1d539d/html5/thumbnails/28.jpg)
ReferenceReference
• XCo: explicit coordination to prevent network fabric congestion in cloud computing cluster platformsVijay Shankar Rajanna, Smit Shah, Anand Jahagirdar, Christopher Lemoine, and Kartik Gopalan
Proceedings of the 19th ACM International Symposium onProceedings of the 19th ACM International Symposium on High Performance Distributed Computing, June 2010
28