network support for cloud services lixin gao, umass amherst
TRANSCRIPT
![Page 1: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/1.jpg)
Network Support for Cloud Services
Lixin Gao, UMass Amherst
![Page 2: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/2.jpg)
Outline
• Data center networking– Design issues– Resource sharing
• Asynchronous computation model
![Page 3: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/3.jpg)
Conventional Data Center Networks
• Hierarchical tree structure• High speed core switches are
expensive• Hard to scale
![Page 4: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/4.jpg)
Data Center Network Design
• Commodity Hardware– Server– Switch
• Scalable
• Fat tree, Dcell, Bcube, VL2, ….
![Page 5: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/5.jpg)
Dpillar Structure
• Devices– All servers have dual-
port– All switches have n-port
• Server and switch columns– k columns
• Server naming– (col, label), label
• Connecting rule– Servers in and ,
their labels differ at only
011... k
iH 1iH
]1log,0[ 2 ni
i
![Page 6: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/6.jpg)
Design Issues
• Inexpensive• Scale to a large number of servers• Fault Tolerant Routing• Load Balancing
![Page 7: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/7.jpg)
Network Resource Sharing within Data Center
• Virtualization of CPU (Xen), memory (DiffEng), storage (SAN)
• Network resource can become bottleneck– Sorting and shuffling of MapReduce– Sync among tasks slows down computation– Backup of VMs
• Bandwidth sharing– Granularity: point-to-point or group based– Fair share: centralized vs. distributed– Privacy: public cloud vs. private cloud
![Page 8: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/8.jpg)
MapReduce Model• Map: generate key value pairs
• Reduce: aggregate values for a key from multiple sources
• Shuffle and sort
![Page 9: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/9.jpg)
Iterative Computations
PageRank
Clustering
BFS
Youtube video suggestion
Pattern Recognition
![Page 10: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/10.jpg)
Synchronous Model
• Ease of MapReduce implementation• However,– Overhead of sync operation, sorting– Slow convergence, waste of CPU,
network resources–Many iterative computations can be
performed asynchronously• PageRank, shorest path, adsorption, link
proximity estimation, belief propagation….
![Page 11: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/11.jpg)
Shortest Paths
0
∞
∞
∞
∞
∞
∞
∞3
14
2
5
1
5
22
4
3
2
3
1
4
∞1
∞
1
mapreduc
e
![Page 12: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/12.jpg)
Shortest Paths
0
∞
∞
∞
∞
∞
∞
∞3
14
2
5
1
5
22
4
3
2
3
1
4
Parallel execution
7
8
3
6
3
∞1
∞
1
8
4
5
5
mapreduc
e
![Page 13: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/13.jpg)
Shortest Paths
0
∞
∞
∞
∞
∞
∞
∞3
14
2
5
1
5
22
4
3
2
3
1
4
7
8
3
6
3
∞1
∞
1
8
4
Parallel execution
5
5
mapreduc
e
![Page 14: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/14.jpg)
An Asynchronous Model
• A general framework– Eliminate synchronization– Scheduling policy
• Prove correctness for a wide range of applications– PageRank, Personalized PageRank– Link Proximity Estimation
• Commute time, Katz metric, shortest path
– Bayesian Inference• Scheduling policies– Top-k query
![Page 15: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/15.jpg)
Shortest Path
Facebook dataset
SSSP-m dataset
![Page 16: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/16.jpg)
PageRank
Google webgraph
PageRank-m webgraph
![Page 17: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/17.jpg)
Conclusions
• Network design within data center– Design based on commodity hardware– Network resources sharing
• Asynchronous computation framework– Reduced bandwidth requirement – Efficient computation
![Page 18: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/18.jpg)
An Example of Outage• planet02.csc.ncsu.edu experiences packet loss on July 30, 2005
![Page 19: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/19.jpg)
Causes of Outages• Most lost packets are caused by routing
outages
Failure Type Lost packets
fraction
unknown 14572 0.2
Routing dynamics
58111 0.8
![Page 20: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/20.jpg)
Towards 5 Nines Reliability
• Exploiting redundancy on Internet Path–Multiple routing instances to ensure
consistency
• Exploiting multiple sites within a cloud– Site selection through route monitoring– Deliver through private WAN
![Page 21: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/21.jpg)
Packet Loss due to Routing Failures
• Failover events: 76% packets lost• Recovery events: 26% packets lost
Failover Recovery
![Page 22: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/22.jpg)
Round-trip Delay• Failover events have significant impact
on packet round-trip delays. In the worst case, packet round-trip delays can be more than 900msec.
Failover Recovery
![Page 23: Network Support for Cloud Services Lixin Gao, UMass Amherst](https://reader036.vdocument.in/reader036/viewer/2022062800/56649e005503460f94ae89bf/html5/thumbnails/23.jpg)
Reordering during Failover Events
• The number of reordered packets is small. However, the offset of reordered packets is large.
• Larger buffer sizes for real-time applications.