free riding multicast sylvia ratnasamy (intel research) andrey ermolinskiy (u.c. berkeley) scott...
TRANSCRIPT
Free Riding Multicast
Sylvia Ratnasamy (Intel Research)
Andrey Ermolinskiy (U.C. Berkeley)
Scott Shenker (U.C. Berkeley and ICSI)
ACM SIGCOMM 2006
Berkeley SysLunch (10/10/06)
Talk Outline Introduction
Overview of the IP Multicast service model Challenges of Multicast routing
Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation
Talk Outline Introduction
Overview of the IP Multicast service model Challenges of Multicast routing
Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation
Internet Routing – a High-Level View
Routing protocols (BGP, OSPF) establish forwarding state in routers
C3C3
C1C1
C4C4
C2C2
Each routable entity is assigned an IP address
Internet is a packet-switched network
C1: Send(Packet, C2Addr);
Routers forward packets towards their recipients
Problem: Some applications
require one-to-many packet delivery Streaming media delivery Digital conferencing Online multiplayer games
GG
GG
GG
GG
SS
Internet Routing – a High-Level View Traditionally, Internet routing infrastructure offers a
one-to-one (unicast) packet delivery service
IP Multicast Service Model In 1990, Steve Deering proposed IP Multicast
extension to the IP service model for efficient one-to-many packet delivery
GG
GG
GG
GG
SS
Group-based communication: Join (IPAddr, GrpAddr); Leave (IPAddr, GrpAddr); Send (Packet, GrpAddr);
Multicast routing problem: Set up a dissemination tree rooted
at the source with group members as leaves
IP Multicast Routing
GG
GG
GG
GG
SS
IP Multicast Routing New members must find
tree
GG
GG
GG
GG
SS
join G? ?
?
IP Multicast Routing
GG
GG
GG
GG
SS
join G? ?
?
New members must find tree
Tree changes with new members, sources
IP Multicast Routing New members must find
tree
Tree changes with new members, sources
Tree changes with network failures
GG
GG
GG
GG
SS
join G? ?
?
IP Multicast Routing New members must find
tree
Tree changes with new members, sources
Tree changes with network failure
Admin. boundaries and policies matter
GG
GG
GG
GG
SS
join G? ?
?
IP Multicast Routing New members must find
tree
Tree changes with new members, sources
Tree changes with network failure
Admin. boundaries and policies matter
Forwarding state grows with number of groups, sources
GG
GG
GG
GG
SS
join G? ?
?
IP Multicast – a Brief History Extensively researched, limited deployment
Implemented in routers, supported by OS vendors Some intra-domain/enterprise usage Virtually no inter-domain deployment
Why? Too complex? PIM-SM, PIM-DM, MBGP, MSDP,
BGMP, IGMP, etc.
FRM goal: make inter-domain multicast simple
Talk Outline Introduction
Overview of the IP Multicast service model Challenges of Multicast routing
Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation
FRM Overview Free Riding Multicast: radical restructuring of inter-domain multicast
Key design choice: decouple group membership discovery from multicast route construction
Principal trade-off: avoidance of distributed route computation at the expense of optimal efficiency
FRM Approach Group membership discovery
Extension to BGP - augment route advertisements with group membership information
FRM Approach Group membership discovery
Extension to BGP - augment route advertisements with group membership information
Multicast route construction Centralized computation at the origin border router Exploit knowledge of unicast BGP routes Eliminate the need for a separate routing algorithm
Group Membership Discovery AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*AS X Augment BGP with per-prefix group
membership information
AS X
Group Membership Discovery AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Domain X joins G1
Augment BGP with per-prefix group membership information
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Domain X joins G1
FRM group membership
{G1 }a.b.*.* XAS PathDest
BGP UPDATE
Border router at X re-advertises its prefix, attaches encoding of active groups
Augment BGP with per-prefix group membership information
a.b*.* {G1}
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
a.b*.* {G1}
Prefix AS Path Active Groups
a.b.*.* V Q P X
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Border routers maintain membership info. as part of per-prefix state in BGP RIB
BGP disseminates membership change
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
a.b*.* {G1}
Prefix AS Path Active Groups
a.b.*.* V Q P X
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Border routers maintain membership info. as part of per-prefix state in BGP RIB
BGP disseminates membership change
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}
Prefix AS Path Active Groups
a.b.*.* V Q P X
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Border routers maintain membership info. as part of per-prefix state in BGP RIB
BGP disseminates membership change
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
a.b*.* {G1}
a.b*.* {G1} a.b*.* {G1}
a.b*.* {G1}
a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}a.b*.* {G1}
Prefix AS Path Active Groups
a.b.*.* V Q P X
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Border routers maintain membership info. as part of per-prefix state in BGP RIB
BGP disseminates membership change
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
a.b*.* {G1}Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Border routers maintain membership info. as part of per-prefix state in BGP RIB
BGP disseminates membership change
AS Z
Group Membership Discovery AS X AS Y
AS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
AS VPrefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Domains Y and Z join G1
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
f.g.*.* {G1}
c.d.e.* {G1} Domains Y and Z join G1
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y
f.g.*.* V R Z
h.i.*.* V Q T
Group Membership Discovery AS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
f.g.*.* {G1}
c.d.e.* {G1} Domains Y and Z join G1
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Domain V: Send(G1, Pkt)
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
Dissemination tree
{G1 }
Domain V: Send(G1, Pkt)
Lookup
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
V
Q
P
X
Dissemination tree
Domain V: Send(G1, Pkt)
{G1 } Lookup
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
V
Q
P
X
Dissemination tree
Y
Domain V: Send(G1, Pkt)
{G1 } Lookup
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
V
Q
P
X
Dissemination tree
R
Z
Y
Domain V: Send(G1, Pkt)
{G1 } Lookup
Packet ForwardingAS X AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Prefix AS Path Active Groups
a.b.*.* V Q P X {G1}
c.d.e.* V Q P Y {G1}
f.g.*.* V R Z {G1}
h.i.*.* V Q T
V
Q
P
X
Dissemination tree
R
Z
Y
Domain V: Send(G1, Pkt)
{G1 } Lookup
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS X AS Y
AS ZAS T
AS Q AS R
AS P
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
G1 SubtreeR
G1 SubtreeQ
Domain V: Send(G1, Pkt)
SubtreeQ SubtreeR
V forwards packet to its children on the tree, attaches encoding the subtree in a “shim” header
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS X AS Y
AS ZAS T
AS P
c.d.e.*a.b.*.*
AS Q AS R
G1 SubtreeQ
G1 SubtreeR
h.i.*.* f.g.*.*
Domain V: Send(G1, Pkt)
V forwards packet to its children on the tree, attaches encoding the subtree in a “shim” header
Packet Forwarding
AS V
G1 SubtreeR
AS X AS Y
AS ZAS T
AS Q AS R
AS P
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
G1 SubtreeQ
Transit routers inspect FRM header, forward packet to their children on the tree
V
Q
P
X
R
Z
Y
Domain V: Send(G1, Pkt)
Packet Forwarding
AS V
AS X AS Y
AS Z
AS Q AS R
AS P
c.d.e.*
f.g.*.*
a.b.*.*
G1 SubtreeQ
No
V
Q
P
X
R
Z
Y
AS T
h.i.*.*
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
G1 SubtreeR
Packet Forwarding
V
Q
P
X
R
Z
Y
AS X AS Y
AS ZAS T
AS Q AS R
AS P
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
G1 SubtreeQ
No
VAS
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
G1 SubtreeR
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS X AS Y
AS ZAS T
AS Q AS R
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*
Yes
AS P
G1 SubtreeQ
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
G1 SubtreeR
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS ZAS T
AS Rf.g.*.*h.i.*.*
AS P
AS Q
AS X
a.b.*.*AS Y
c.d.e.*
G1 TREE_BFQG1 SubtreeQ
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
G1 SubtreeR
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS ZAS T
AS Rf.g.*.*h.i.*.*
AS Q
AS X
a.b.*.*AS Y
c.d.e.*
AS P
G1 SubtreeR
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
Packet Forwarding
V
Q
P
X
R
Z
Y
AS V
AS T
AS Rh.i.*.*
AS Q
AS X
a.b.*.*AS Y
c.d.e.*
AS P
AS Z
f.g.*.*
G1 SubtreeR
Domain V: Send(G1, Pkt)
Transit routers inspect FRM header, forward packet to their children on the tree
FRM Details Encoding group membership
Simple enumeration is hard to scale Border routers encode locally active groups using a Bloom filter Transmit encoding using a new path attribute in BGP UPDATE message
Encoding the dissemination tree Encode edges into a shim header using a Bloom filter
Tree computation is expensive Border routers maintain shim header cache
Talk Outline Introduction
Free Riding Multicast (FRM) Approach overview Overhead evaluation
Router storage requirements Forwarding bandwidth overhead (in paper)
Design tradeoffs Implementation
FRM Overhead – Router Storage AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*AS X
1. Source forwarding state (per-group, line card memory)
2. Group membership state (per-prefix, BGP RIB)
Origin border router
Transit forwarding state (per-neighbor, line card memory)
Transit router
FRM Overhead – Router Storage AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*AS X
1. Source forwarding state (per-group, line card memory)
2. Group membership state (per-prefix, BGP RIB)
Origin border router
Transit forwarding state (per-neighbor, line card memory)
Transit router
Forwarding State (Source Border Router)
0
100
200
300
400
500
600
700
800
900
100 1000 10000 100000 1M
Number of groups with active sources (A)
Ca
ch
e s
ize
(M
B)
256 MB of line card memory enables fast-path forwarding for ~200000 active groups
A -- number of groups with sources in the local domain
Zipfian group popularity with a minimum of 8 domains per group
25 groups have members in every domain (global broadcast)
FRM Overhead – Router Storage AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*AS X
1. Source forwarding state (per-group, line card memory)
2. Group membership state (per-prefix, BGP RIB)
Origin border router
Transit forwarding state (per-neighbor, line card memory)
Transit router
Group Membership State Requirements Total of A multicast
groups Domains of prefix
length p have 232-p users
Each user chooses and joins k distinct groups from A
10 false positives per prefix allowed
1M simultaneously active groups and 10 groups per user require ~3GB of route processor memory (not on the fast path)
FRM Overhead – Router Storage AS Y
AS ZAS T
AS Q AS R
AS P
AS V
c.d.e.*
f.g.*.*h.i.*.*
a.b.*.*AS X
1. Source forwarding state (per-group, line card memory)
2. Group membership state (per-prefix, BGP RIB)
Origin border router
Transit forwarding state (per-neighbor, line card memory)
Transit router
Forwarding State (Transit Router) Number of forwarding entries = number of neighbor ASes
Independent of number of groups!
90% of ASes: 10 forwarding entries 99% of ASes: 100 forwarding entries Worst case: 2400 forwarding entries
AS V
AS Q
AS PAS T ? ?
?
Talk Outline
Introduction
Free Riding Multicast (FRM) Approach overview Overhead evaluation Design tradeoffs Implementation
FRM Design Tradeoffs Protocol simplicity
Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source
border router from existing unicast routes)
FRM Design Tradeoffs Protocol simplicity
Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source
border router from existing unicast routes)
Ease of configuration Management within familiar BGP framework Avoid rendezvous point selection
FRM Design Tradeoffs Protocol simplicity
Can be implemented as a straightforward extension to BGP Centralized route construction (tree is computed at source
border router from existing unicast routes)
Ease of configuration Management within familiar BGP framework Avoid rendezvous point selection
Enables ISP control over sources/subscribers To block traffic for an undesired group, drop it from BGP
advertisement Source controls dissemination tree facilitates source-based
charging [Express].
FRM Design Tradeoffs
FRM Design Tradeoffs Group membership state maintenance
Membership information disseminated more widely
FRM Design Tradeoffs Group membership state maintenance
Membership information disseminated more widely
Nontrivial bandwidth overhead (see paper for results) Per-packet shim header Redundant packet transmissions
FRM Design Tradeoffs Group membership state maintenance
Membership information disseminated more widely
Group membership state maintenance Membership information disseminated more widely
Nontrivial bandwidth overhead (see paper for results) Per-packet shim header Redundant packet transmissions
New packet forwarding techniques Full scan of the BGP RIB at source border router Bloom filter lookups at transit routers
FRM Implementation A proof-of-concept prototype on top of Linux 2.4 and the eXtensible Open Router Platform (http://www.xorp.org).
Functional components: FRM kernel module (3.5 KLOC of new Linux kernel code)
Interfaces with the Linux kernel IP layer and implements the packet forwarding plane FRM user-level component (1.9 KLOC of new code)
Extension to the XORP BGP daemon Implements tree construction and group membership state dissemination
Configuration and management tools (1.4 KLOC of new code)
Summary Free Riding Multicast is a very different approach to inter-domain multicast routing
FRM makes use of existing unicast routing infrastructure for group membership discovery and route construction
Reduce protocol complexity via aggressive use of router resources
Thank you
Challenges and Future Work Incremental Deployment
Legacy BGP routers rate-limit their path advertisements (30 seconds), thus delaying dissemination of group membership state.
Large group Bloom filters that exceed maximum BGP UPDATE message size (4KB) require fragmentation and reassembly.
Explore alternative tree encoding techniques to reduce per-packet bandwidth overhead
Backup Slides
FRM Overhead – Redundant TransmissionsTotal number of transmissions required to transfer a single packet to all
group members (FRM header size = 100 bytes) Ideal Mcast – precisely 1 packet
is transmitted along each edge
Per-AS Unicast – source unicasts to each members AS individually
For all group sizes, the overall bandwidth consumed by FRM is close to that of Ideal Mcast (within 2.4%).
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
1000 10000 100000 1M 10M
Group Size
Nu
mb
er o
f p
acke
t tr
ansm
issi
on
s
Per-AS Unicast FRM Ideal Mcast
FRM Overhead – Redundant TransmissionsNumber of transmissions per AS-level link required to transfer a single
packet to all group members (FRM header size = 100 bytes)
Per-AS Unicast with 10M users:• 6% of links see redundant transmissions.• Worst case: 6950 transmissions per link.
FRM with 10M users:• Less than 0.5% of links see redundant transmissions.• Worst case: 157 transmissions per link• Worst case with optimization (see paper): 2 transmissions per link
Encoding Group Membership State Simple enumeration is hard to scale.
Border routers encode the set of locally active groups using a constant-size Bloom filter (GRP_BF) of length L.
{G1, G2, G3, G4, …} 011011011010…
GRP_BFK hash functions
BGP speakers communicate their GRP_BF state as part of their regular route advertisements (BGP UPDATE message) using a new path attribute.
Encoding Group Membership State Use of Bloom filters introduces possibility of false
positives – a domain may on occasion receive traffic for a group it has no interest in.
To deal with unwanted traffic, recipient domain can install an explicit filter rule at the upstream provider’s network.
For a given number of available upstream filters f, the recipient computes the maximum tolerable false positive rate r and chooses its filter length L accordingly.
r = Min(1, f / (A – G))A = size of the group address spaceG = number of groups to be encoded
Summary Free Riding Multicast is a very different approach to inter-domain multicast routing
FRM makes use of existing unicast routing infrastructure for group membership discovery and route construction
Reduce protocol complexity via aggressive use of router resources
Might be interesting to consider the viability of this approach in broader context
Group Membership Bandwidth Overhead For GRP_BFs with 5 hash functions and bit
positions represented by 24-bit values, the payload of a membership update message for a single group join/leave event is approx. 15 bytes.
Assuming 200000 prefixes in the BGP RIB and 1 group membership event per second per prefix, the aggregate rate of incoming GRP_BF update traffic at a border router is approx. 3MBps.
Why IP Multicast? Technical feasibility aside, now might be a good time
to revisit the desirability question Multicast applications now more widespread
IP-TV, MMORPG, digital conferencing Better understanding of ISP requirements
Bottom line: simple multicast design might open the door to more widespread adoption