coolstreaming

CoolStreaming/DONet:A Data-Driven Overlay Network for Efficient Live Media Streaming

Presented by:Michael Iline & Ariel Krinitsa

Advanced Topics in Communication

2/05/2011

Table of Contents

Introduction Motivation Related Work

Tree-based Protocols and Extensions Gossip-based Protocols

Design and Optimization of DONet Node Join and Membership Management Buffer Map Representation and

Exchange Scheduling Algorithm Failure Recovery and Partnership

Refinement Analysis of Overlay Radius Experimental Results

3

Introduction

Introduction

Many multimedia applications, such as NetTV and news broadcast, involve live media streaming from a source to a large population of users.

IP Multicast is probably the most efficient. But it remains confined due to the lack of incentives to install multicast capable routers and to carry multicast traffic.

Introduction (Cont.)

Application-level solution is called overlay nodes, and multicast is then achieved through data relaying among these nodes.

Construction algorithms have tree structure for data delivering.

Motivation

This works well with dedicated infrastructure routers as in IP multicast, it often mismatches an application-level overlay with dynamic nodes.

As the autonomous overlay nodes can easily crash or leave.

A tree is highly vulnerable Have high bandwidth and stringent

continuity demands. Sophisticated structures like mesh

and forest can partially solve the problem, but they are much more complex and often less scalable.

Motivation (Cont.)

A data-centric design of a streaming overlay:

availability of data guides the flow directions, not a specific overlay structure

it is more suitable for overlay with high dynamic nodes (semistatic structure)

All the nodes have strong buffering capabilities and can adaptively and intelligently determine the data forwarding directions

The core operations in DONet are very simple:

easy to implement efficient robust and resilient

Motivation (Cont.)

The key design issues of DONet: How the partnerships are formed How the data availability information are

encoded and exchanged How the video data are supplied and

retrieved among partners

BackgroundA brief detour to explain base concepts and protocols

CoolStreaming/DONet A Data-Driven Overlay Network for Efficient Live Media Streaming

Related Work

Numerous overlay multicast systems can be classified into two categories: proxy-assisted and peer-to-peer based.

Proxy-assisted

Servers or application-level proxies are strategically placed.

Peer-to-peer based

Self-organized overlay networks. Based multimedia distribution service.

DONet belongs to peer-to-peer based category.

Related WorkTree-based Protocols and Extensions

Constructing and maintaining an efficient distribution tree among the overlay nodes is a key issue to these systems.

An internal node in a tree has a higher load and its leave or crash often causes buffer underflow in a large population of descendants.

Related WorkTree-based Protocols and Extensions (Cont.)

DONet has a simpler and straight data-driven design, which does not maintain more complex structure, nor relies on an advanced coding scheme.

13

Related WorkGossip-based Protocols Implementing a gossiping protocol in DONet

for membership management. What is Membership?

“Who knows whom” relation

▪ A knows C, C knows F▪ But D does not know C, J does not know B

CoolStreaming/DONet A Data-Driven Overlay Network for Efficient Live Media Streaming

A

C

B

D

F

H

GE

I

J

14

Gossip (or epidemic)algorithm

a node sends a newly generated message to a set of randomly selected nodes; these nodes do similarly in the next round.

Motivation

Gossip-based dissemination protocols Each member forwards the message to randomly

chosen group members Probabilistic guarantee [Reliability] ( guarantee

message delivery with high probability) Scalable Resilient to node/link failures

However, traditional gossip-based multicast protocols rely on non-scalable membership protocol Each node has the complete view of the system High overhead in storage and synchronization

SCAMP(scalable membership protocol)

Aiming at the weakness of traditional full membership protocols, SCAMP proposes a scalable probabilistic membership protocol for gossip-based multicast

Scalable, fully decentralized Each node maintains partial, yet sufficient system

view Self-reconfigurable

View size in each member can change when system size changes

Any isolated node can rejoin the system automatically with isolation recovery mechanism

SCAMP Operations

Basic membership management

Subscription Un-subscription Isolation Recovery▪ Simply solved by using heart beating and re-

subscribing

Graph rebalancing mechanisms

Indirection Lease Mechanism

Membership List

Each node k maintains two lists of group members

PartialView : a list containing its gossip targets

InView : a list of nodes which k is one of their gossip targets

19

Basic Protocol - Subscription

Subscription (new node join) Contact: New nodes join the group by sending a

subscription request to an arbitrary member.

N A

B

C

Drequest

20

Subscription (new node join) New subscription: When a node receives a new

subscription request, it forwards the new node-id to all members of its own local view. It also creates c additional copies (to be discussed later).

N A

B

C

Drequest

request

requestrequest


21

Subscription (new node join) Forwarded subscription: When a node receives a

forwarded subscription▪ With probability p = 1/(1+size of PartialViewk), add the

subscriber into its PartialView▪ Otherwise, forward the subscription to a randomly

chosen node from k’s PartialView

N A

B

C

D

accept

forward

forward

E

F

forward G


22

Subscription (new node join) Keeping a subscription: When a node decides to

keep the subscription, it integrates the new subscriber in its PartialView, and informs the subscriber to update its InView

N A

B

C

Daccept

forward

forward

E

F

forward G


All forwarded subscriptions are eventually kept by some node.

If the new node subscribes to a node with out-degree d, then d+c+1 arcs are added

Let E[Mn] denote the expected number of arcs when the number of nodes is n E[Mn]≈(c+1)nlogn

“In earlier work [17], it was shown that the following, sharper result: If there are n nodes and each node gossips to log(n)+k other nodes on average, then the probability that everyone gets the message converges to ”

Toy example : =0.87 , =0.98

Properties of Subscription

24

Basic Protocol - UnSubscription Unsubscription (node leaves)

Views:▪ Let PartialViewn = { i(1), i(2), … , i(L) }▪ Let InViewn = { j(1), j(2), … ,j(L’) }

Informs nodes j(1)~j(L’-c-1) to replace its id �with i(1)~i(L’-c-1) (mod L), respectively

Informs nodes j(L’-c)~j(L’) to remove it from their lists.

N

A

D

C

B

x

x x

x

replace

remove

25

Properties of UnSubscription If the leaving node has a in-degree d,

the total number of arcs decreases by d+c+1 d-c-1 by replacing (c+1)*2 by removing

E[Mn-1]≈(c+1)(n-1)log(n-1)

UnSubscriptions preserve the desired mean degree of arcs

SCAMP Operations

Basic membership management

Subscription Un-subscription Recovery Isolation▪ Simply solved by using heart beating and re-

subscribing

Graph rebalancing mechanisms

Indirection Lease Mechanism

27

Refinements - Indirection mechanisms

Indirection mechanisms

How would a newly joint node select a node to contact? Choosing at random uniformly among existing members requires global information.

Solution: the initial contact forwards the newcomer’s subscription request to a node which is chosen approximately at random among all existing nodes.▪ Balance the lists by Indirection mechanisms▪ The node which receives the subscription request forwards the “token”

with a counter value▪ Decrement the counter every hop forwarded▪ The member where token with zero counter arrives acts as a contact node

28

Lease mechanisms Each subscription has a finite lifetime

Each node is responsible to resubscribe to a random chosen member from its PartialView▪ Subscriber’s PartialView remains the same▪ However, each node’s PartialView gradually changes

(even there is no change to the system) Advantages

Helps to rebalance the size of partial views across group members

Removes invalid information caused by leaving the group without unsubscribing

Refinements – Lease mechanisms

29

Experimental Results

Distribution of partial view size

30

Resilience to node failures (Full view VS Partial view)

Experimental Results

Impact of C

Impact of C (Cont.)

Impact of Lease mechanism

Conclusions

Pros

Fully decentralized protocol with O(logn) partial view size

With a very close performance to full membership protocol

Cons

The reason why indirection does not improve the performance is not solved completely

?34

Conclusions (Cont.)

SCAMP Membership management system for gossip-based

multicast Partial View (O(log n)) per member in average▪ Scalable▪ No global system size needed▪ Self-reconfigurating▪ Used with O(log n) gossip-based multicast

Achieve load balancing by using several techniques▪ Indirection▪ Distribution of contact work

▪ Lease mechanism▪ Good to often change the view?

SplitStream: High-bandwidth multicast in a cooperative environment 36

DesignSo, how does it work?

Design and Optimization of DONet

The system diagram of a DONet node

There are three key modules:

Membership manager

Partnership manager

Scheduler

Design and Optimization of DONet

DONet node can be either a receiver or a supplier, or both.

Nodes periodically exchange data availability information with a set of partners.

An exception is the source node - origin node, which is always a supplier.

Design and Optimization of DONetNode Join and Membership Management Each DONet node has a unique identifier, such as

IP address and a membership cache (mCache)containing a partial list of the identifiers.

Newly node first contacts the origin node, which randomly selects a deputy node from its mCache and redirects the new node to the deputy.

Use SCAM (Scalable Gossiping Membership Protocol) to distribute membership messages among nodes.

Design and Optimization of DONetNode Join and Membership Management

Two events trigger updates of an mCache entry:

1.the membership message is to be forwarded to other nodes through gossiping

2. the node serves as a deputy and the entry is to be included in the partner candidate list.

Design and Optimization of DONetBuffer Map Representation and Exchange

Illustration of the partnership in DONet (origin node: A)

Design and Optimization of DONetBuffer Map Representation and Exchange

A sliding window of 120-segment can effectively represent the buffer map of node.

Using 120 bits to record a BM, with bit 1 indicating that a segment is available and 0 otherwise.

Transmission SchedulingProblem: From which partner to fetch

which data segment ? Constraints

Data availability Playback deadline Heterogeneous partner bandwidth

This problem is a variation of the Parallel machine scheduling NP-hard problem The situation will become worse in a highly dynamic

environment Resort a simple heuristic of fast response time

Scheduling Algorithm

Variation of Parallel machine scheduling NP-hard

Heuristic Message exchanged▪ Window-based buffer map (BM): Data availability▪ Segment request (piggyback by BM)

Less suppliers first Multi-supplier: Highest bandwidth within deadline

first

Simpler algorithm in current implementation Bounded by for execution time

Scheduling Algorithm

For each expected Set i Check availability at partners

If i exists update the deadline [Prtnerj,Seti]

… Setk Setk+

1

Setk+

2

Setk+

3

…

Prtnra 11sec 13sec 17sec 14sec

Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 15sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

Supplier

Setk

Setk+

1

Setk+

2

Setk+

3

Scheduling Algorithm For each expected Set i

For each Segmenti If which have Segmenti is 1

Supplier[i] = For each expected setj where

Else

] =

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 15sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 15sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

Supplier

Setk Prtnra

Setk+

1

Setk+

2

Setk+

3

Scheduling Algorithm For =2 to

For each Segmenti Supplier[i] == For each expected setj where

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 10sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 15sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

Supplier

Setk Prtnra

Setk+

1

Prtnrd

Setk+

2

Setk+

3



… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 10sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 10sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

Supplier

Setk Prtnra

Setk+

1

Prtnrd

Setk+

2

Prtnra

Setk+

3



… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 10sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

… Setk Setk+

1

Setk+

2

Setk+

3

…


Prtnrb 7sec 7sec

Prtnrc 19sec

Prtnrd 8sec 10sec

Prtnre 32sec

Prtnrf 11sec

Prtnrg 8sec

Supplier

Setk Prtnra

Setk+

1

Prtnrd

Setk+

2

Prtnra

Setk+

3

Prtnrf

Design and Optimization of DONetFailure Recovery and Partnership Refinement

The departure can be easily detected after an idle time of TFRC or BM exchange.

An affected node can quickly react through re-scheduling using the BM information of the remaining partners.

Operations to further enhance resilience:

Graceful departure: the departure message when departing

Node failure: the departure message on behalf the failed node.

The departure message is gossiped similarly to the membership message.

Each node periodically establish new partnerships with nodes randomly selected from its mCache.

The new partner with the lowest score can be rejected to keep a stable number of partners.

using function

Design and Optimization of DONetFailure Recovery and Partnership Refinement

Analysis on DONet

Coverage ratio for distance k (# of neighbors: M, total nodes: N)

E.g. 95% nodes are covered in 6 hops when M=4, N=500 Average distance O(logN)

NM

MM k

e )2(

2)1(

1

Analysis on DONet

DONet vs Tree-based overlay Much lower outage probability

PlanetLab-based experiment

PlanetLab

An open platform for developing, deploying, and accessing planetary-scale services

Involved 200~300 nodes during experiment period (May to June, 2004)

Streaming rate: 500 Kbps

*http://www.planet-lab.org/

Result: data continuity

Continuity index: number of segments that arrive before or on playback deadlines over the total number segments Data continuity, 200 nodes, 500 kbps streaming

Result: control overhead vs. number of partners for different overlay sizes

Result: Continuity index as a function of streaming rate (size = 200 nodes)

CoolStream

A practical DONet implementation First version release: May, 2004 Support Real Video and Windows

Media format Broadcast live sport programs at

450~755 Kbps Attached 30000 users

User distribution

Heterogeneous network environment

LAN, CABLE, DSL, …

Online statistics (June 21, 2004)

Conclusion

Present the design of DONet for live media streaming Data-driven design Scalable membership and partnership management

algorithm Heuristic scheduling algorithm

The experiment results on PlantLab demonstrate DONet delivers quite good playback quality in a highly dynamic networks

A practical implementation was also released for broadcasting live programs

Thank You.

coolstreaming

Technology

subscription new node

new subscription

new nodes

new nodeid

streaming overlay

isolated node

internal node

autonomous overlay nodes