coolstreaming
TRANSCRIPT
CoolStreaming/DONet:A Data-Driven Overlay Network for Efficient Live Media Streaming
Presented by:Michael Iline & Ariel Krinitsa
Advanced Topics in Communication
2/05/2011
Table of Contents
Introduction Motivation Related Work
Tree-based Protocols and Extensions Gossip-based Protocols
Design and Optimization of DONet Node Join and Membership Management Buffer Map Representation and
Exchange Scheduling Algorithm Failure Recovery and Partnership
Refinement Analysis of Overlay Radius Experimental Results
3
Introduction
Introduction
Many multimedia applications, such as NetTV and news broadcast, involve live media streaming from a source to a large population of users.
IP Multicast is probably the most efficient. But it remains confined due to the lack of incentives to install multicast capable routers and to carry multicast traffic.
Introduction (Cont.)
Application-level solution is called overlay nodes, and multicast is then achieved through data relaying among these nodes.
Construction algorithms have tree structure for data delivering.
Motivation
This works well with dedicated infrastructure routers as in IP multicast, it often mismatches an application-level overlay with dynamic nodes.
As the autonomous overlay nodes can easily crash or leave.
A tree is highly vulnerable Have high bandwidth and stringent
continuity demands. Sophisticated structures like mesh
and forest can partially solve the problem, but they are much more complex and often less scalable.
Motivation (Cont.)
A data-centric design of a streaming overlay:
availability of data guides the flow directions, not a specific overlay structure
it is more suitable for overlay with high dynamic nodes (semistatic structure)
All the nodes have strong buffering capabilities and can adaptively and intelligently determine the data forwarding directions
The core operations in DONet are very simple:
easy to implement efficient robust and resilient
Motivation (Cont.)
The key design issues of DONet: How the partnerships are formed How the data availability information are
encoded and exchanged How the video data are supplied and
retrieved among partners
BackgroundA brief detour to explain base concepts and protocols
CoolStreaming/DONet A Data-Driven Overlay Network for Efficient Live Media Streaming
Related Work
Numerous overlay multicast systems can be classified into two categories: proxy-assisted and peer-to-peer based.
Proxy-assisted
Servers or application-level proxies are strategically placed.
Peer-to-peer based
Self-organized overlay networks. Based multimedia distribution service.
DONet belongs to peer-to-peer based category.
Related WorkTree-based Protocols and Extensions
Constructing and maintaining an efficient distribution tree among the overlay nodes is a key issue to these systems.
An internal node in a tree has a higher load and its leave or crash often causes buffer underflow in a large population of descendants.
Related WorkTree-based Protocols and Extensions (Cont.)
DONet has a simpler and straight data-driven design, which does not maintain more complex structure, nor relies on an advanced coding scheme.
13
Related WorkGossip-based Protocols Implementing a gossiping protocol in DONet
for membership management. What is Membership?
“Who knows whom” relation
▪ A knows C, C knows F▪ But D does not know C, J does not know B
CoolStreaming/DONet A Data-Driven Overlay Network for Efficient Live Media Streaming
A
C
B
D
F
H
GE
I
J
14
Gossip (or epidemic)algorithm
a node sends a newly generated message to a set of randomly selected nodes; these nodes do similarly in the next round.
Motivation
Gossip-based dissemination protocols Each member forwards the message to randomly
chosen group members Probabilistic guarantee [Reliability] ( guarantee
message delivery with high probability) Scalable Resilient to node/link failures
However, traditional gossip-based multicast protocols rely on non-scalable membership protocol Each node has the complete view of the system High overhead in storage and synchronization
SCAMP(scalable membership protocol)
Aiming at the weakness of traditional full membership protocols, SCAMP proposes a scalable probabilistic membership protocol for gossip-based multicast
Scalable, fully decentralized Each node maintains partial, yet sufficient system
view Self-reconfigurable
View size in each member can change when system size changes
Any isolated node can rejoin the system automatically with isolation recovery mechanism
SCAMP Operations
Basic membership management
Subscription Un-subscription Isolation Recovery▪ Simply solved by using heart beating and re-
subscribing
Graph rebalancing mechanisms
Indirection Lease Mechanism
Membership List
Each node k maintains two lists of group members
PartialView : a list containing its gossip targets
InView : a list of nodes which k is one of their gossip targets
19
Basic Protocol - Subscription
Subscription (new node join) Contact: New nodes join the group by sending a
subscription request to an arbitrary member.
N A
B
C
Drequest
20
Subscription (new node join) New subscription: When a node receives a new
subscription request, it forwards the new node-id to all members of its own local view. It also creates c additional copies (to be discussed later).
N A
B
C
Drequest
request
requestrequest
Basic Protocol - Subscription
21
Subscription (new node join) Forwarded subscription: When a node receives a
forwarded subscription▪ With probability p = 1/(1+size of PartialViewk), add the
subscriber into its PartialView▪ Otherwise, forward the subscription to a randomly
chosen node from k’s PartialView
N A
B
C
D
accept
forward
forward
E
F
forward G
Basic Protocol - Subscription
22
Subscription (new node join) Keeping a subscription: When a node decides to
keep the subscription, it integrates the new subscriber in its PartialView, and informs the subscriber to update its InView
N A
B
C
Daccept
forward
forward
E
F
forward G
Basic Protocol - Subscription
All forwarded subscriptions are eventually kept by some node.
If the new node subscribes to a node with out-degree d, then d+c+1 arcs are added
Let E[Mn] denote the expected number of arcs when the number of nodes is n E[Mn]≈(c+1)nlogn
“In earlier work [17], it was shown that the following, sharper result: If there are n nodes and each node gossips to log(n)+k other nodes on average, then the probability that everyone gets the message converges to ”
Toy example : =0.87 , =0.98
Properties of Subscription
24
Basic Protocol - UnSubscription Unsubscription (node leaves)
Views:▪ Let PartialViewn = { i(1), i(2), … , i(L) }▪ Let InViewn = { j(1), j(2), … ,j(L’) }
Informs nodes j(1)~j(L’-c-1) to replace its id �with i(1)~i(L’-c-1) (mod L), respectively
Informs nodes j(L’-c)~j(L’) to remove it from their lists.
N
A
D
C
B
x
x x
x
replace
remove
25
Properties of UnSubscription If the leaving node has a in-degree d,
the total number of arcs decreases by d+c+1 d-c-1 by replacing (c+1)*2 by removing
E[Mn-1]≈(c+1)(n-1)log(n-1)
UnSubscriptions preserve the desired mean degree of arcs
SCAMP Operations
Basic membership management
Subscription Un-subscription Recovery Isolation▪ Simply solved by using heart beating and re-
subscribing
Graph rebalancing mechanisms
Indirection Lease Mechanism
27
Refinements - Indirection mechanisms
Indirection mechanisms
How would a newly joint node select a node to contact? Choosing at random uniformly among existing members requires global information.
Solution: the initial contact forwards the newcomer’s subscription request to a node which is chosen approximately at random among all existing nodes.▪ Balance the lists by Indirection mechanisms▪ The node which receives the subscription request forwards the “token”
with a counter value▪ Decrement the counter every hop forwarded▪ The member where token with zero counter arrives acts as a contact node
28
Lease mechanisms Each subscription has a finite lifetime
Each node is responsible to resubscribe to a random chosen member from its PartialView▪ Subscriber’s PartialView remains the same▪ However, each node’s PartialView gradually changes
(even there is no change to the system) Advantages
Helps to rebalance the size of partial views across group members
Removes invalid information caused by leaving the group without unsubscribing
Refinements – Lease mechanisms
29
Experimental Results
Distribution of partial view size
30
Resilience to node failures (Full view VS Partial view)
Experimental Results
Impact of C
Impact of C (Cont.)
Impact of Lease mechanism
Conclusions
Pros
Fully decentralized protocol with O(logn) partial view size
With a very close performance to full membership protocol
Cons
The reason why indirection does not improve the performance is not solved completely
?34
Conclusions (Cont.)
SCAMP Membership management system for gossip-based
multicast Partial View (O(log n)) per member in average▪ Scalable▪ No global system size needed▪ Self-reconfigurating▪ Used with O(log n) gossip-based multicast
Achieve load balancing by using several techniques▪ Indirection▪ Distribution of contact work
▪ Lease mechanism▪ Good to often change the view?
SplitStream: High-bandwidth multicast in a cooperative environment 36
DesignSo, how does it work?
Design and Optimization of DONet
The system diagram of a DONet node
There are three key modules:
Membership manager
Partnership manager
Scheduler
Design and Optimization of DONet
DONet node can be either a receiver or a supplier, or both.
Nodes periodically exchange data availability information with a set of partners.
An exception is the source node - origin node, which is always a supplier.
Design and Optimization of DONetNode Join and Membership Management Each DONet node has a unique identifier, such as
IP address and a membership cache (mCache)containing a partial list of the identifiers.
Newly node first contacts the origin node, which randomly selects a deputy node from its mCache and redirects the new node to the deputy.
Use SCAM (Scalable Gossiping Membership Protocol) to distribute membership messages among nodes.
Design and Optimization of DONetNode Join and Membership Management
Two events trigger updates of an mCache entry:
1.the membership message is to be forwarded to other nodes through gossiping
2. the node serves as a deputy and the entry is to be included in the partner candidate list.
Design and Optimization of DONetBuffer Map Representation and Exchange
Illustration of the partnership in DONet (origin node: A)
Design and Optimization of DONetBuffer Map Representation and Exchange
A sliding window of 120-segment can effectively represent the buffer map of node.
Using 120 bits to record a BM, with bit 1 indicating that a segment is available and 0 otherwise.
Transmission SchedulingProblem: From which partner to fetch
which data segment ? Constraints
Data availability Playback deadline Heterogeneous partner bandwidth
This problem is a variation of the Parallel machine scheduling NP-hard problem The situation will become worse in a highly dynamic
environment Resort a simple heuristic of fast response time
Scheduling Algorithm
Variation of Parallel machine scheduling NP-hard
Heuristic Message exchanged▪ Window-based buffer map (BM): Data availability▪ Segment request (piggyback by BM)
Less suppliers first Multi-supplier: Highest bandwidth within deadline
first
Simpler algorithm in current implementation Bounded by for execution time
Scheduling Algorithm
For each expected Set i Check availability at partners
If i exists update the deadline [Prtnerj,Seti]
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 13sec 17sec 14sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 15sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
Supplier
Setk
Setk+
1
Setk+
2
Setk+
3
Scheduling Algorithm For each expected Set i
For each Segmenti If which have Segmenti is 1
Supplier[i] = For each expected setj where
Else
] =
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 11sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 15sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 13sec 17sec 14sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 15sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
Supplier
Setk Prtnra
Setk+
1
Setk+
2
Setk+
3
Scheduling Algorithm For =2 to
For each Segmenti Supplier[i] == For each expected setj where
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 11sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 10sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 11sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 15sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
Supplier
Setk Prtnra
Setk+
1
Prtnrd
Setk+
2
Setk+
3
Scheduling Algorithm For =2 to
For each Segmenti Supplier[i] == For each expected setj where
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 5sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 10sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 11sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 10sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
Supplier
Setk Prtnra
Setk+
1
Prtnrd
Setk+
2
Prtnra
Setk+
3
Scheduling Algorithm For =2 to
For each Segmenti Supplier[i] == For each expected setj where
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 5sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 10sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
… Setk Setk+
1
Setk+
2
Setk+
3
…
Prtnra 11sec 10sec 14sec 5sec
Prtnrb 7sec 7sec
Prtnrc 19sec
Prtnrd 8sec 10sec
Prtnre 32sec
Prtnrf 11sec
Prtnrg 8sec
Supplier
Setk Prtnra
Setk+
1
Prtnrd
Setk+
2
Prtnra
Setk+
3
Prtnrf
Design and Optimization of DONetFailure Recovery and Partnership Refinement
The departure can be easily detected after an idle time of TFRC or BM exchange.
An affected node can quickly react through re-scheduling using the BM information of the remaining partners.
Operations to further enhance resilience:
Graceful departure: the departure message when departing
Node failure: the departure message on behalf the failed node.
The departure message is gossiped similarly to the membership message.
Each node periodically establish new partnerships with nodes randomly selected from its mCache.
The new partner with the lowest score can be rejected to keep a stable number of partners.
using function
Design and Optimization of DONetFailure Recovery and Partnership Refinement
Analysis on DONet
Coverage ratio for distance k (# of neighbors: M, total nodes: N)
E.g. 95% nodes are covered in 6 hops when M=4, N=500 Average distance O(logN)
NM
MM k
e )2(
2)1(
1
Analysis on DONet
DONet vs Tree-based overlay Much lower outage probability
PlanetLab-based experiment
PlanetLab
An open platform for developing, deploying, and accessing planetary-scale services
Involved 200~300 nodes during experiment period (May to June, 2004)
Streaming rate: 500 Kbps
*http://www.planet-lab.org/
Result: data continuity
Continuity index: number of segments that arrive before or on playback deadlines over the total number segments Data continuity, 200 nodes, 500 kbps streaming
Result: control overhead vs. number of partners for different overlay sizes
Result: Continuity index as a function of streaming rate (size = 200 nodes)
CoolStream
A practical DONet implementation First version release: May, 2004 Support Real Video and Windows
Media format Broadcast live sport programs at
450~755 Kbps Attached 30000 users
User distribution
Heterogeneous network environment
LAN, CABLE, DSL, …
Online statistics (June 21, 2004)
Conclusion
Present the design of DONet for live media streaming Data-driven design Scalable membership and partnership management
algorithm Heuristic scheduling algorithm
The experiment results on PlantLab demonstrate DONet delivers quite good playback quality in a highly dynamic networks
A practical implementation was also released for broadcasting live programs
Thank You.