dr. multicast for data center communication scalability ymir vigfusson hussam abu-libdeh mahesh...
TRANSCRIPT
![Page 1: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/1.jpg)
Dr. Multicast for Data Center Communication Scalability
Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken BirmanCornell University
Yoav TockIBM Research Haifa
HotNets, October 5, 2008
![Page 2: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/2.jpg)
IP Multicast in Data Centers
• IPMC is not used in data centers
![Page 3: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/3.jpg)
IP Multicast in Data Centers
• IPMC is not used in data centers• Would speed up products that use multicast
![Page 4: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/4.jpg)
IP Multicast in Data Centers
• Why is IP multicast rarely used?
![Page 5: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/5.jpg)
IP Multicast in Data Centers
• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and
NICs
![Page 6: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/6.jpg)
IP Multicast in Data Centers
• Why is IP multicast rarely used?o Limited IPMC scalability on switches/routers and
NICso Broadcast storms: Loss triggers a horde of
NACKs, which triggers more loss, etc. o Disruptive even to non-IPMC applications.
![Page 7: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/7.jpg)
IP Multicast in Data Centers
• IP multicast has a bad reputation
![Page 8: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/8.jpg)
IP Multicast in Data Centers
• IP multicast has a bad reputationo Works great up to a point,
after which it breaks catastrophically
![Page 9: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/9.jpg)
IP Multicast in Data Centers
• Bottom line:o Administrators have no control over multicast
use ...o Without control, they opt for never.
![Page 10: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/10.jpg)
![Page 11: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/11.jpg)
Dr. Multicast
![Page 12: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/12.jpg)
Dr. Multicast (MCMD)
• Policy: Permits data center operators to selectively enable and control IPMC
• Transparency: Standard IPMC interface, system
calls are overloaded. • Performance: Uses IPMC when possible,
otherwise point-to-point unicast • Robustness: Distributed, fault-tolerant service
![Page 13: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/13.jpg)
Terminology
• Process: Application that joins logical IPMC groups
• Logical IPMC group: A virtualized abstraction• Physical IPMC group: As usual• UDP multi-send: New kernel-level system-call
• Collection: Set of logical IPMC groups with
identical membership
![Page 14: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/14.jpg)
Acceptable Use Policy
• Assume a higher-level network management tool compiles policy into primitives
• Explicitly allow a process to use IPMC groupso allow-join(process,logical IPMC)o allow-send(process,logical IPMC)
• UDP multi-send always permitted • Additional restraints
o max-groups(process,limit)o force-udp(process,logical IPMC)
![Page 15: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/15.jpg)
Overview
• Library module• Mapping module• Gossip layer
• Optimization
questions • Results
![Page 16: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/16.jpg)
• Transparent. Overloads the IPMC functions o setsockopt(), send(), etc.
• Translation. Logical IPMC map to a
set of P-IPMC/unicast addresses.o Two extremes
MCMD Library Module
![Page 17: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/17.jpg)
• MCMD Agent runs on each machineo Contacted by the library modules o Provides a mapping
• One agent elected to be a leader:
o Allocates IPMC resources according to the current policy
MCMD Mapping Role
![Page 18: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/18.jpg)
• Allocating IPMC resources: An optimization problem
Procs
L-IPMC
MCMD Mapping Role
This box intentionally left
BLACK
Procs
Collections
L-IPMC
![Page 19: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/19.jpg)
• Runs system-wide as part of the agent • Automatic failure detection
• Group membership fully replicated via gossip
o Node reports its own stateo Future: Replicate more selectively o Leader runs optimization algorithm on data and
reports the mapping
MCMD Gossip Layer
![Page 20: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/20.jpg)
• But gossip is slow... • Implications:
o Slow propagation of group membershipo Slow propagation of new mapso We assume a low rate of membership churn
• Remedy: Broadcast module
o Leader broadcasts urgent messages o Bounded bandwidth of urgent channelo Trade-off between latency and scalability
MCMD Gossip Layer
![Page 21: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/21.jpg)
Overview
• Library module• Mapping module• Gossip layer
• Optimization
questions • Results
![Page 22: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/22.jpg)
Optimization Questions
Procs L-IPMC
BLACK
Collections
Procs L-IPMC
• First step: compress logical IPMC groups
![Page 23: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/23.jpg)
klk;l Optimization Questions
• How compressible are subscriptions?o Multi-objective optimization:
Minimize number of collectionsMinimize bandwidth overhead on network
o Thm: The general problem is NP-completeo Thm: In uniform random allocation, "little"
compression opportunity. o Social preferences o Lots of duplicates due to replication (e.g. for
load balancing)
![Page 24: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/24.jpg)
klk;l Optimization Questions
• Which collections get an IPMC address?o Thm: Ordered by decreasing traffic*size,
assign P-IPMC addresses greedily, we minimize bandwidth.
• Tiling heuristic:o Sort L-IPMC by traffic*sizeo Greedily collapse identical groupso Assign IPMC to collections in reverse order of
traffic*size, UDP-multisend to the rest• Building tilings incrementally
![Page 25: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/25.jpg)
klk;l Experimental Results
![Page 26: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/26.jpg)
• Insignificant overhead when mapping L-IPMC to P-IPMC.
klk;l Overhead (max. throughput)
![Page 27: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/27.jpg)
• Insignificant overhead when mapping L-IPMC to P-IPMC.
klk;l Overhead (CPU utilization)
![Page 28: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/28.jpg)
klk;l Network Overhead
• Gossip Layer uses constant background bandwidth, urgent channel behaves well
![Page 29: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/29.jpg)
Latency
• Latency of propagation of joins/leaves and new maps
![Page 30: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/30.jpg)
• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects
klk;l Policy control
<Traffic starts<New policy
![Page 31: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/31.jpg)
Conclusion
• IPMC has been a bad citizen...
![Page 32: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/32.jpg)
Conclusion
• IPMC has been a bad citizen...
• Dr. Multicast has the cure!
• Opportunity for big performance enhancements and policy control.
![Page 33: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/33.jpg)
Thank you!
![Page 34: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/34.jpg)
Thank you!
![Page 35: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/35.jpg)
• Insignificant overhead when mapping L-IPMC to P-IPMC.
klk;l Overhead
![Page 36: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/36.jpg)
• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects
klk;l Policy control
![Page 37: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/37.jpg)
• A malfunctioning node bombards an existing IPMC group.• MCMD policy prevents ill-effects
klk;l Policy control
![Page 38: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/38.jpg)
• Linux kernel module increases UDP-multisend throughput by 17% (compared to user-space UDP-multisend)
klk;l Overhead
![Page 39: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/39.jpg)
klk;l Latency of events
• Gossip: 99% of nodes aware of change within 9 epochs (now 1 sec)
![Page 40: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/40.jpg)
Conclusions
• Policy: Allows data center operators to enable and control IPMC
• Transparency: Standard IPMC interface, system
calls are overloaded. • Performance: Uses IPMC when possible,
otherwise point-to-point UDP • Robustness: Distributed, fault-tolerant service
![Page 41: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/41.jpg)
klk;l Results
• Library Moduleo Insignificant slowdown
o Linux Kernel module provides 17% speed-up for UDP multi-send
![Page 42: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/42.jpg)
klk;l Optimization questions
Users
Topics
This box intentionally left
BLACKUsers
Groups
Topics
• Multi-objective: o Minimize number of groupso Minimize bandwidth overhead on network
• Thm: This problem is NP-completeo Reduction to Minimum Normal Set Basis
![Page 43: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/43.jpg)
MCMD Library Layer
• Overloads the IPMC functions o setsockopt(), send(), etc.
• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy
• Notifies MCMD immediately about joins/leaves
• Learns about new mappings from MCMD
• Keeps statistics about group traffic rates
![Page 44: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/44.jpg)
MCMD Library Layer
• Overloads the IPMC functions o setsockopt(), send(), etc.
• Translates logical IPMC addresses to physical IPMC, or point-to-point UDP packets depending on policy
• Caches translation maps• Maintains a connection to MCMD for
updates
![Page 45: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/45.jpg)
![Page 46: Dr. Multicast for Data Center Communication Scalability Ymir Vigfusson Hussam Abu-Libdeh Mahesh Balakrishnan Ken Birman Cornell University Yoav Tock IBM](https://reader035.vdocument.in/reader035/viewer/2022062404/551506485503465e608b470c/html5/thumbnails/46.jpg)
Overview
• Library module• Mapping module• Gossip layer
• Optimization
questions • Results