cs234 – application layer multicasting tuesdays, thursdays 3:30-4:50p.m. ics 243 prof. nalini...

CS234 – Application Layer Multicasting

Tuesdays, Thursdays 3:30-4:50p.m.ICS 243

Prof. Nalini [email protected]

(with slides adapted from Dr. Animesh Nandi and Dr.

Ayman Abdel-Hamid)

Groupware applicationsCollaborative applications

require 1-many communications Video conference Interactive distributed simulations Stock exchange (NYSE) News dissemination, file updates Online games

Dynamic peer groups Dynamic membership (group size) Non-hierarchical

Multicasting © Dr. Ayman Abdel-Hamid 3

Logical Multipoint Communications

•Two basic logical organizationsRooted: hierarchy (perhaps just two levels) that structures communicationsNon-rooted: peer-to-peer (no distinguished nodes)

•Different structure could apply to control and data “planes”

Control plane determines how multipoint session is createdData plane determines how data is transferred between hosts in the multipoint session


Logical Multipoint CommunicationsControl Plane

•The control plane manages creation of a multipoint sessionRooted control plane

One member of the session is the root, c_rootOther members are the leafs, c_leafsNormally c_root establishes a session

Root connects to one or more c_leafsc_leafs join c_root after session established

Non-rooted control planeAll members are the same (c_leafs)Each leaf adds itself to the session


Logical Multipoint CommunicationsData PlaneThe data plane is concerned with data transfer

•Rooted data planeSpecial root member, d_rootOther members are leafs, d_leafsData transferred between d_leafs and d_roots

d_leaf to d_rootd_root to d_leaf

There is no direct communication between d_leafs•Non-rooted data plane

No special members, all are d_leafsEvery d_leafs communicate with all d_leafs

Multicasting© Dr. Ayman Abdel-Hamid,

CS4254 Spring 2006 6

Multicast Communication•Multicast abstraction is peer-to-peer

Application-level multicastNetwork-level multicast

Requires router support (multicast-enabled routers)Multicast provided at network protocol level IP multicast


Multicast Communication

•Transport mechanism and network layer must support multicast

•Internet multicast limited to UDP (not TCP)

Unreliable: No acknowledgements or other error recovery schemes (perhaps at application level)

Connectionless: No connection setup (although there is routing information provided to multicast-enabled routers)

Datagram: Message-based multicast

IP multicast Using unicast to replicate packets not efficient thus, IP multicast needed Built around the notion of group of hosts:

Senders and receivers need not know about each other Sender simply sends packets to “logical” group address No restriction on number or location of receivers

Applications may impose limits Normal, best-effort delivery semantics of IP

• Unlike broadcast, allows a proper subset of hosts to participate

• Standards: IP Multicast (RFC 1112)


IP Multicast•IP supports multicasting

Uses only UDP, not TCPSpecial IP addresses (Class D) identify multicast groupsInternet Group Management Protocol (IGMP) to provide group routing informationMulticast-enabled routers selectively forward multicast datagramsIP TTL field limits extent of multicast

•Requires underlying network and adapter to support broadcast or, preferably, multicast

Ethernet (IEEE 802.3) supports multicast


IP Multicast: Group Address•How to identify the receivers of a multicast datagram?•How to address a datagram sent to these receivers?

Each multicast datagram to carry the IP addresses of all recipients? Not scalable for large number of recipients Use address indirection

A single identifier used for a group of receivers


IP Multicast: IGMP Protocol•RFC 3376 (IGMP v3): operates between a host and its directly attached router

•host informs its attached router that an application running on the host wants to join or leave a specific multicast group

•another protocol is required to coordinate multicast routers throughout the Internet network layer multicast routing algorithms

•Network layer multicast IGMP and multicast routing protocols

•IGMP enables routers to populate multicast routing tables

•Carried within an IP datagram


CS4254 Spring 2006 12

IP Multicast: IGMP Protocol

•Joining a groupHost sends group report when the first process joins a given groupApplication requests join, service provider (end-host) sends report

•Maintaining table at the routerMulticast router periodically queries for group informationHost (service provider) replies with an IGMP report for each groupHost does not notify router when the last process leaves a group this is discovered through the lack of a report for a query


CS4254 Spring 2006 13

IP Multicast: Multicast Routing•Multicast routers do not maintain a list of individual members of each host group•Multicast routers do associate zero or more host group addresses with each interface


CS4254 Spring 2006 14

IP Multicast: Multicast Routing•Multicast router maintains table of multicast groups that are active on its networks•Datagrams forwarded only to those networks with group members


IP Multicast: Multicast Routing

•How multicast routers route traffic amongst themselves to ensure delivery of group traffic?

Find a tree of links that connects all of the routers that have attached hosts belonging to the multicast group

Group-shared treesSource-based trees

Shared Tree

Source Trees


CS4254 Spring 2006 16

MBONE: Internet Multicast Backbone

•The MBone is a virtual network on top of the Internet (section B.2)

Routers that support IP multicast

IP tunnels between such routers and/or subnets

Issues with IP Multicast IP Multicast islands

ISPs are not always interested in mutual cooperation. Forwarding IP Unicast

traffic is “easier” (stateless) than forwarding IP Multicast. Billing Issues

Violates ISP input-rate-based billing model No incentive for ISPs to enable multicast!

No indication of group size (again needed for billing) Security Issues

ISPs fear the additional traffic induced through IP Multicast (DoS attacks – join high traffic volume Multicast groups).

Multicast islands are interconnected with tunnels: MBONE. Manual setup of tunnels, Administrative overhead.

Result: Usability is reduced

Application Layer Multicast A promising alternative: ALM

Attempts to mimic multicast routing at the application level

Instead of high-speed routers, use programmable end hosts as interior nodes in the ALM system

ALM SystemALM System

ALM Structure

ALM Structure

Dissemination Method

Dissemination Method

Data Dissemination in ALM Concerns

What is target data? Streaming, Large files, Notification

Messages How fast?

Tolerant to delay, emergency case How reliably? How efficiently?

Overhead, Bandwidth

What you expect with ALM

System perspective Utilize all available bandwidth

User perspective Perfect continuity Low startup delay Low lag

Design space of CEMALM

Data Plane Control Plane

Tree

Forest/

Multiple

Tree

Mesh

Hybrid

(Tree with Mesh

)

SAAR

[17]

Random

Network

Data Plane : Tree Loop free path Capacity of each tree link >= the

streaming rate Push based forwarding

Low latency

Unable to utilize the forwarding bandwidth of all participating nodes

Only interior nodes contribute Highly sensitive node failure

Subtrees are significantly affected by a node failure

Require sophisticated recovery techniques

ESM (End-System Multicast)[2], Overcast[3], ZIGZAG[4], NICE[5]

Points of failure Points on an ALM structure which

can disturb message dissemination behind these points Each node of a single tree structure is

a point of failure

Is failure recovery feasible? Tree repair takes time which causes delay

Tens of seconds Detection of failure takes time

Ex) TCP timeover, hearbeat duration Tree reconstruction cost is high

Eager techniques that frequently monitor failure – significant overhead

Goal: Reliable dissemination without failure recovery?

Increasing Reliability Increasing the number of parents

N parents ALM structure SplitStream[2003], K-DAG [2004], Multicast Forest

[2005] N parents ALM endures N-1 failures among

its parents But, each node may get N-1 redundant

messages

Data Plane : Forest/Multiple Tree

K trees K loop free paths Data is separated into multiple (K) chunks

Each chunk is disseminated in one of the trees Capacity of each tree link >= the streaming

rate/K Each node can have more children than a single tree Average depth of trees is lower than a single tree Reduce delay and increase fault resilience

High Reliability Sending same data to K paths Endure K-1 failures

SplitStream[6], CoopNet[7], Chunkyspread[8]

Random Multicast Forest Use multiple multicast tree ( K ) Each node gets k parents Sometimes it can have the same

parents for different trees Cannot endure K-1 failures with some

probability

h

g

i

S

a

b c

d

e f

K-DAG (Directed Acyclic Graph )

Each node selects each parent randomly Select “M” closest candidate nodes ( M > K ) Select K parents randomly among the M

Each node gets K parents Still it can not guarantee message delivery

hg i

S

a b c

d e

f

SplitStream[6] Forest based dissemination Basic idea

Split the stream into K stripes (with MDC coding)

For each stripe create a multicast tree such that the forest

Contains interior-node-disjoint trees Respects nodes’ individual bandwidth constraints

SplitStream : MDC coding Multiple Description coding

Fragments a single media stream into M substreams (M ≥ 2 )

K packets are enough for decoding (K < M)

Less than K packets can be used to approximate content

Useful for multimedia (video, audio) but not for other data

Cf) erasure coding for large data file

SplitStream : Interior-node-disjoint tree

Each node in a set of trees is interior node in at most one tree and leaf node in the other trees.

Each substream is disseminated over subtrees

S

a

b c

d

e f h

g

i

a

b c h

g

i

d

e f

d

e f

a

b c h

g

i

ID =0x… ID =1x… ID =2x…

Data Plane : Mesh

A Random Graph Node’s degree is proportional to the node’s

forwarding bandwidth Minimum degree : ensure the mesh remains connected

in the presence of churn Gossip based forwarding

Buffer (b) : most recently received blocks Every r seconds, nodes exchange the information of

missing blocks and buffered blocks. Randomly request an available block for a missing block

CoolStreaming[9], Chainsaw[10], PRIME[11], PULSE[12], CREW[13]

Pros and Cons of Mesh

Fully utilize bandwidth of all nodes High Failure resilience High overhead Long Delay/start up

Gossip-based Broadcast

Probabilistic Approach with Good Fault Tolerant Properties Choose a destination node, uniformly at random, and send it the message After Log(N) rounds, all nodes will have the message w.h.p. Requires N*Log(N) messages in total Needs a ‘random sampling’ service

Usually implemented as Rebroadcast ‘fanout’ times Using UDP: Fire and Forget

BiModal Multicast (99), Lpbcast (DSN 01), Rodrigues’04 (DSN), Brahami ’04, Verma’06

(ICDCS), Eugster’04 (Computer), Koldehofe’04, Periera’03

Data Plane : Hybrid

Tree/Forest + Mesh Tree/Forest : Push-based

dissemination Low delay

Mesh : Pull-based dissemination Instant failure recovery

Bullet[14], PRM[15], FaReCast[16]

Bullet

Layers a mesh on top of an overlay tree to increase overall bandwidth

Basic Idea Use a tree as a basis In addition, each node continuously

looks for peers to download from In effect, the overlay is a tree combined

with a random network (mesh)

PRM (Probabilistic Resilient Multicasting)

Tree-based Multicasting with a few of random mesh links.

Cross-Link Redundancy Add (random) cross links to the

forwarding topology For Reliability

PRM (Probabilistic Resilient Multicasting)

Recovery Mechanism of PRM Basic assumption

Nodes maintain a set of mesh neighbors Ephemeral forwarding – Reactive

Upon detecting a disconnection, the head of disconnected node tries to obtain the stream from one of the mesh neighbors

Before data plane recovery Randomized forwarding – Proactive

A node forwards the stream to a random mesh neighbors with very small probability (1~3%)

Case Study : FaReCast for Flash Dissemination

Early Warning System Earthquake Early Warning System

Earthquake Event : Rare event, but its spreading speed is very fast

Give the recipients a little more time to prepare for the disaster

A simple alert-message should be delivered to all recipients within very short time duration

41

Catastrophe Resilience Use-Case

Earthquakes may damage infrastructure in unpredictable and multiple ways

Power, Telecommunication, Servers

If earthquake disrupts the major routers/links in LA

Can assume 10-25% of all overlay nodes in California will be “down”

Usually, people have studied node failures on smaller scale: Fault Tolerance

We are looking at scenario where a large fraction of the network is simultaneously affected : Catastrophe Tolerance

Design systems that are resilient to 50% simultaneous node failures

Characteristic of Recipients

The number of Recipients is huge Hundreds of thousands recipients may

be interested in this warning messages Recipients are unreliable

Normal machine administrated by normal users

Free will of join/leave Internal lagging for forwardingNeed to consider the reliable message dissemination to all online

recipients

Need to consider the reliable message dissemination to all online

recipients

( http://neic.usgs.gov/neis/eqlists/eqstats.html )

Event characteristicNumber of Earthquakes in the United States for 2000 - 2009

Located by the US Geological Survey National Earthquake Information Center

Magnitude 200

0200

1200

2200

32004 2005 2006 2007 2008 2009

8.0 to 9.9 0 0 0 0 0 0 0 0 0 0

7.0 to 7.9 0 1 1 2 0 1 0 1 0 0

6.0 to 6.9 6 5 4 7 2 4 7 9 9 0

5.0 to 5.9 63 41 63 54 25 47 51 72 73 5

4.0 to 4.9 281 290 536 541 284 345 346 366 446 49

3.0 to 3.9 917 842 1535130

31364 1475 1213 1137 1483 171

2.0 to 2.9 660 646 1228 704 1336 1738 1145 1173 1570 247

1.0 to 1.9 0 2 2 2 1 2 7 11 14 4

0.1 to 0.9 0 0 0 0 0 0 1 0 0 0

No Magnitude 415 434 507 333 540 73 13 22 20 6

Total234

22261 3876

2946

3552 3685 * 2783 2791 * 3615 * 482

The major earthquake (Magnitude>6) is very rare event.

Need to consider the maintenance overhead of the early warning system.

The major earthquake (Magnitude>6) is very rare event.

Need to consider the maintenance overhead of the early warning system.

44/32

Forest-based M2M ALM structure

Multiple Fan-In – increase reliability with rich path diversityMultiple Fan-Out – for efficient and fast dissemination↑Path Diversity, ↑ reliability, ↑speed

Multiple Fan-In – increase reliability with rich path diversityMultiple Fan-Out – for efficient and fast dissemination↑Path Diversity, ↑ reliability, ↑speed

45/32

Properties of M2M ALM structure

Fan-in constraint Each node has Fi distinct parents (except level 0/1) All Parents of a node are on the previous level

Level : length in overlay from a root node

Root node reliability Loop-Free transmission

No loop in parent-to-children dissemination All paths from root to node have same length – data

reaches from diff. parents closeby in time.

Parent set uniqueness Parent sets of any two nodes at a level (level ≥2) differ

by at least one parent. Increases path diversity, more reliable delivery

46/32

Realizing parent set uniqueness

A

D

Level 0

Level 1

Level 2

Fan-out factor = 2Fan-in = 2

Fan-out factor = 2Fan-in = 2

B C

E

?

F

G H

I

J K

2)( ioL

oL FFFN

NL = 4

NL = 6

Bypassing Problem with Simple Multicasting Method

a

d

b c

g

e

i

f

h

d

i

2 parents ALM structure2 parents ALM structure (1) Bypassing Problem: Type 1- The message may bypass an alive nodes, when its all parents fail- A child (node h) of a bypassed node ( node d ) can receive the message from other parents

(1) Bypassing Problem: Type 1- The message may bypass an alive nodes, when its all parents fail- A child (node h) of a bypassed node ( node d ) can receive the message from other parents(2) Bypassing Problem: Type 2-A child of a bypassed node can not receive the message, when its other parents fail

(2) Bypassing Problem: Type 2-A child of a bypassed node can not receive the message, when its other parents fail

Probability of Bypassing

Probability of bypassing (type 1)- Pf : probability that a node fails- n parents : means there are ‘n’ paths from which the message can come

Probability of bypassing (type 1)- Pf : probability that a node fails- n parents : means there are ‘n’ paths from which the message can come

n

jjPf

1

Probability of bypassing (type 2)- Multiply ‘Pf’ of its parents to the probability of bypassing- It is smaller than type 1, but still can happen

Probability of bypassing (type 2)- Multiply ‘Pf’ of its parents to the probability of bypassing- It is smaller than type 1, but still can happen

Prevent Bypassing

Use Child-to-Parents relationship General dissemination of tree/forest

ALM structures uses only Parent-to-Children relationship

Get additional ‘m’ paths, where ‘m’ is the number of children

m

jj

n

jj PfPf

11

Backup Dissemination

a

d

b c

g

e

i

f

h

d

i

2 parents ALM structure2 parents ALM structure

d

i

Each node can expect the duplicated messages

Each node can expect the duplicated messages

A node does not get duplicated message within some time, it sends the message to its parents which did not send a message

A node does not get duplicated message within some time, it sends the message to its parents which did not send a message

Limitation of Backup Dissemination

a

d

b c

g

e f

h

d

i

2 parents ALM structure2 parents ALM structure Node g, h and i are leaf nodes having no backup

paths from children

Node g, h and i are leaf nodes having no backup

paths from children

j

Node d and h can get the message by the


Node d and h can get the message by the


Still node i can not get the message

Leaf nodes are problem

Still node i can not get the message

Leaf nodes are problem

h

d

Leaf-to-Leaf Dissemination

a

d

b c

g

e f

h

d

i

2 parents ALM structure2 parents ALM structure

j

h

d

d

d

f

Each leaf node has links to other leaf nodes sharing the same

parents

Each leaf node has links to other leaf nodes sharing the same

parents

i

A leaf node does not get duplicated message

within some time from a specific parent, it sends

the message to this parent as well as its

children

A leaf node does not get duplicated message

within some time from a specific parent, it sends

the message to this parent as well as its

children

EXTRA SLIDES

Control Plane : Random Network Bounce Protocol in CREW[13]

A kind of K-DAG Random walk through a tree to get new

parents Construct DAG with random parents

S

a b

c d e f

g

NeighborRequest

Control Plane : SAAR

Isolated control plane to support efficient and flexible selection of data dissemination peers

Multiple overlays share a control overlay Reduce overhead and improve scalability

Anycast primitive used to build different overlays

References[1] Understanding the design tradeoffs for cooperative streaming multicast, In Max

Planck Institute for Software Systems Technical Report MPI-SWS-2009-002.[2] Early Experience with an Internet Broadcast System Based on Overlay Multicast.

In Proceedings of USENIX Annual Technical Conference 2004.[3] Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of OSDI

2000.[4] ZIGZAG: An efficient peer-to-peer scheme for media streaming. In Proceedings of

INFOCOM 2003.[5] Scalable Application Layer Multicast. In Proceedings of SIGCOMM 2002.[6] Splitstream: High-bandwidth multicast in a cooperative environment. In

Proceedings of SOSP 2003.[7] Distributing streaming media content using cooperative networking. In

Proceedings of NOSSDAV 2002.[8] Chunkyspread: Heterogeneous unstructured tree-based peer-to-peer multicast.

In Proceedings of ICNP 2006.[9] Coolstreaming/DONet: A data-driven overlay network for peer-to-peer live

media streaming. In Proceedings of INFOCOM 2005.[10] Chainsaw: Eliminating trees from overlay multicast. In Proceedings of IPTPS 2005.[11] PRIME: Peer-to-peer Receiver-drIven MEsh-based Streaming. In Proceedings of

INFOCOM 2007.[12] PULSE: An adaptive, incentive-based, unstructured p2p live streaming system.

In Proceedings of IEEE Transactions on Multimedia, Vol. 9, Nov. 2007.[13] CREW: A Gossip-based Flash-Dissemination System. In Proceedings of ICDCS

2006.[14] Bullet:High Bandwidth Data Dissemination Using an Overlay Mesh. In

Proceedings of SOSP 2003.[15] Resilient multicast using overlays. In Proceedings of ACM SIGMETRICS 2003.[16] FaReCast: Fast, Reliable Application Layer Multicast for Flash Dissemination. In

Proceedings of Middleware 2010.[17] SAAR: A shared control plane for overlay multicast. In Proceedings of NSDI 2007.

EXTRA SLIDES (Not covered)

Constraint Model for CEM

Continuity

T-C : the proportion of the streamed bits that have arrived at the receiving node within T seconds of their first

transmission by the source

T-C : the proportion of the streamed bits that have arrived at the receiving node within T seconds of their first

transmission by the source

Mesh achieves almost perfect continuity

after T=45

Mesh achieves almost perfect continuity

after T=45Trees are fast, but lose

continuity without recoveries

Trees are fast, but lose continuity without

recoveries

Join Delay

Effect of stream rate

With high stream rate (e.g. Ri=1.2), tree loses continuity substantially.

With high stream rate (e.g. Ri=1.2), tree loses continuity substantially.

Reducing Mesh Lag

By decreasing the streaming block size By decreasing swarming interval (r)

By decreasing the streaming block size By decreasing swarming interval (r)

Overhead!!

Closer view of buffer size and swarming interval

Swarming overhead hampers delivery delay of blocks.

Swarming overhead hampers delivery delay of blocks.

Improving Tree Continuity

Recovery mechanisms improve continuity, but incur delays

Recovery mechanisms improve continuity, but incur delays

Under high churn and packet loss

cs234 – application layer multicasting tuesdays, thursdays 3:30-4:50p.m. ics 243 prof. nalini...

Documents