distributed computing (cs 515)csit.uob.edu.pk/images/web/staff/lecture/doc-6.2014-11... · 2015. 9....
TRANSCRIPT
Distributed Computing (CS 515)
Dr. Ihsan Ullah
Department of Computer Science & ITUniversity of Balochistan, Quetta
Pakistan
November 11, 2013
P2P Distributed Computing 1/91 1 / 91
Outline
1 Peer-to-Peer (P2P) networks
2 Unstructured P2P networks
3 Structured P2P networks
4 Application Layer Multicast (ALM)
P2P Distributed Computing 2/91 2 / 91
Peer-to-Peer (P2P) networks
Outline
1 Peer-to-Peer (P2P) networks
2 Unstructured P2P networks
3 Structured P2P networks
4 Application Layer Multicast (ALM)
P2P Distributed Computing 3/91 3 / 91
Peer-to-Peer (P2P) networks Introduction
P2P networks
Peer dictionary meaning: a person of the same age, status, or abilityas another specified person
A distributed network architecture may be called a Peer-to-Peernetwork, if the participants share a part of their own hardwareresources (processing power, storage capacity, network link capacity,printers, ...). These shared resources are necessary to provide theService and content offered by the network. They are accessible byother peers directly, without passing intermediary entities. Theparticipants of such a network are thus resource providers as well asresource requestors [Sch01]
P2P Distributed Computing 4/91 4 / 91
Peer-to-Peer (P2P) networks Introduction
P2P networks
Peer-to-peer systems are distributed systems consisting ofinterconnected nodes able to self-organize into network topologieswith the purpose of sharing resources such as content, CPU cycles,storage and bandwidth, capable of adapting to failures andaccommodating transient populations of nodes while maintainingacceptable connectivity and performance, without requiring theintermediation or support of a global centralized server or authority[ATS04]
P2P Distributed Computing 5/91 5 / 91
Peer-to-Peer (P2P) networks Introduction
P2P overlay
A virtual network over the physical one
P2P Distributed Computing 6/91 6 / 91
Peer-to-Peer (P2P) networks Introduction
Peer
A computer, an end-user, an application
Depends on the contextAlways an end system, but an end system is not always a peerAn end system can be a dedicated video server that is part of a CDN,or a BitTorrent client that is part of a P2P network
P2P Distributed Computing 7/91 7 / 91
Peer-to-Peer (P2P) networks Introduction
Peer
Leacher
A peer that is client and serverIn the context of content deliveryHas a partial copy of the content
Seader
A peer that is only serverIn the context of content deliveryHas a full copy of the content
P2P Distributed Computing 8/91 8 / 91
Peer-to-Peer (P2P) networks Introduction
P2P paradigm
Why P2P applications became popular in mid-2000 only?
High speed Internet connectionsPowerful end hostsThe End-to-end argument
End-to-end argument: a functionality should only be implemented ata lower layer if the achieved performance out-weighs the cost ofadditional complexity that occurs
P2P Distributed Computing 9/91 9 / 91
Peer-to-Peer (P2P) networks Introduction
P2P Applications
P2P applications capitalize on any resource from anybody
P2P applications can share CPU, bandwidth and storage
BitTorrent, EmuleSkypeSopCast, PPLive
P2P Distributed Computing 10/91 10 / 91
Peer-to-Peer (P2P) networks Introduction
Characteristics of P2P networks
Resource sharing: each peer contributes system resources to theoperation of the P2P system. Ideally this resource sharing isproportional to the peer’s use of the P2P system, but many systemssuffer from the free rider problem
Networked: all nodes are interconnected with other nodes in the P2Psystem, and the full set of nodes are members of a connected graph.When the graph is no longer connected, the overlay is said to bepartitioned
Decentralization: the behavior of the P2P system is determined bythe collective actions of peer nodes, and there is no central controlpoint. May require centralization for management such as login
Symmetry: nodes assume equal roles in the operation of the P2Psystem. In many designs this property is relaxed by the use of specialpeer roles such as super peers or relay peers
P2P Distributed Computing 11/91 11 / 91
Peer-to-Peer (P2P) networks Introduction
Characteristics of P2P networks
Autonomy: participation of the peer in the P2P system is determinedlocally, and there is no single administrative context for the P2Psystem
Self-organization: the organization of the P2P system increases overtime using local knowledge and local operations at each peer, and nopeer dominates the system
Scalable: a pre-requisite of operating P2P systems with millions ofsimultaneous nodes, and means that the resources used at each peerexhibit a growth rate as a function of overlay size that is less thanlinear. It also means that the response time doesn’t grow more thanlinearly as a function of overlay size
Stability: within a maximum churn rate, the P2P system should bestable, i.e., it should maintain its connected graph and be able toroute deterministically within a practical hop-count bounds
P2P Distributed Computing 12/91 12 / 91
Peer-to-Peer (P2P) networks Introduction
Classification of P2P networks
Two classes:
Unstructured
The overlay networks organize peers in a random graph in flat orhierarchical manners (e.g. Super-Peers layer)
Use flooding, random walks or expanding-ring Time-To-Live (TTL)search
Structured
the P2P overlay network topology is tightly controlled and contentare placed not at random peers but at specified locations that willmake subsequent queries more efficient
Distributed Hash Table (DHT)
P2P Distributed Computing 13/91 13 / 91
Unstructured P2P networks
Outline
1 Peer-to-Peer (P2P) networks
2 Unstructured P2P networks
3 Structured P2P networks
4 Application Layer Multicast (ALM)
P2P Distributed Computing 14/91 14 / 91
Unstructured P2P networks
Introduction
Do not impose any structure on the overlay networks
Usually resilient to peer dynamics, and support complex queries
But they are not efficient for locating unpopular files
Depending on system decentralization level, unstructured P2Pnetworks can be classified into:
Centralized
Distributed
Hybrid
Other approaches
P2P Distributed Computing 15/91 15 / 91
Unstructured P2P networks
Design considerations
Search efficiency and replication cost:
Data are distributedly stored at peers and each peer only holds limitedinformation about the systemImportant to design efficient search mechanisms.With more data replications in the network, a data file can be locatedmore quicklyA tradeoff between storage cost and search time
Scalability:
A P2P system may consist of hundreds of thousands of peersOften requires a fully distributed system, where peers form aself-organized network and each peer communicates with only a fewother peers
P2P Distributed Computing 16/91 16 / 91
Unstructured P2P networks
Design considerations
Resilience:
In P2P systems, a peer may arbitrarily join, leave or failA good P2P system should be resilient to such peer dynamics
Load balancing:
Peers often have heterogeneous resource (e.g., bandwidth,computational capability, storage space)A good system should be able to achieve balanced loads among peersThis can avoid overloading of hot peers, and hence improve systemscalability
Security:
Some participating peers may be selfish and unwilling to upload datato others, or some may launch attacks to disrupt the serviceA practical P2P system should be well protected to targeted attacks orfree-ridersOther considerations include peers’ privacy and confidentiality
P2P Distributed Computing 17/91 17 / 91
Unstructured P2P networks Centralized approach: Napster
Napster
Napster started in fall of 1999, created by Shawn Fanning
Napster was the first P2P file sharing application
Only sharing of MP3 files was possible
First lawsuit in December 1999 from several major recordingcompanies
Got 13.6 million users in February 2001
July 2001 judge ordered Napster to shut down
Current Napster is an online music store, not P2P
P2P Distributed Computing 18/91 18 / 91
Unstructured P2P networks Centralized approach: Napster
Napster: communication
1 A new peer registers with thecentral server and shares a listof files
2 A registered peer sends query tothe central server
3 The central server provides theaddress of the peer containingthe content
4 The content is downloaded fromthe peer directly
P2P Distributed Computing 19/91 19 / 91
Unstructured P2P networks Centralized approach: Napster
Napster: strengths
Consistent view of the network
Central server always knows who is there and who is not
Fast and efficient searching
Central server always knows all available filesEfficient searching on the central server
Answer guaranteed to be correct
Nothing found means none of the current on-line peers in the networkhas the file
P2P Distributed Computing 20/91 20 / 91
Unstructured P2P networks Centralized approach: Napster
Napster: limitations
Downloading from a single peer only
Central server is a single point of failure
Central server needs enough computation power to handle all queries
Results unreliable
No guarantees about file contentsServer knows nothing about local situation at peers (load)Some information such as user bandwidth entered by the user, notguaranteed to be correct (i.e., not measured)
P2P Distributed Computing 21/91 21 / 91
Unstructured P2P networks Distributed Approach
Gnutella
A fully distributed P2P system for file sharing
Publicly available source code
Enables the development of different client softwares (LimeWire)
Gnutella peer is called servent (Server + client)
All peers are equal
P2P Distributed Computing 22/91 22 / 91
Unstructured P2P networks Distributed Approach
Gnutella: network joining
A new peer joining first connects to some public peers (e.g. availableon a website)
Then it sends a PING message to any peer it is connected toannounce the existence of the new peer
A Gnutella peer returns a PONG message and propagates the PINGmessage to its neighbors
A PONG message contains IP address and port of the respondingpeer, and information of files being shared
Peer periodically sends PING messages to its neighbors
P2P Distributed Computing 23/91 23 / 91
Unstructured P2P networks Distributed Approach
Gnutella: searching
Search in Gnutella is based on flooding, which is broadcasting in theoverlay
A search query is propagated to all neighbors from the originalrequesting peer
The query is replicated and forwarded by each intermediate peer to allits neighbors
Each intermediate peer also examines its local contents and respondsto the query source on a match
Peer periodically sends PING messages to its neighbors
A time-to-live (TTL) field to reduce the number of query messages
The TTL value is decremented by one at each peer, message isdropped on TTL=0
P2P Distributed Computing 24/91 24 / 91
Unstructured P2P networks Distributed Approach
Gnutella: strengths
By contrast to Napster, Gnutella is a dynamic, self-organized network
Each peer independently connects to and communicates with a fewother peers in the system
The system is able to contain an unlimited number of peers, if noconstraint on search efficiency
The system is highly robust to peer dynamics
If a peer leaves the system, its neighbors can connect to other peersthrough the exchange of PING and PONG messages
P2P Distributed Computing 25/91 25 / 91
Unstructured P2P networks Distributed Approach
Gnutella: limitations
Low search efficiency: the number of query messages exponentiallyincreases with the number of overlay hops
Then a query may generate many messages, especially for unpopularfiles
The use of TTL can reduce the number of query messages but,choosing an appropriate TTL is not easy
Mltiple copies of a query may be sent to a peer by its multipleneighbors
P2P Distributed Computing 26/91 26 / 91
Unstructured P2P networks Hybrid approach
FastTrack/Kazaa
Given the limitations of purely centralized and distributed networks,this approach combines the two
In FastTrack, peers with the fastest Internet connections and themost powerful computers are automatically designated as supernodes
Other nodes are called ordinary nodes
A supernode maintains information about some resource as well asconnections with other supernodes
Exploitation of heterogeneity of peers and organization of peers into ahierarchy
Kazaa is based on FastTrack
P2P Distributed Computing 27/91 27 / 91
Unstructured P2P networks Hybrid approach
Kazaa: Searching
A peer wanting to search some file, sends its query to the closestsupernode
The supernode either returns some matching peers, or forwards thequery to other supernodes
At some stage, a supernode will respond with some matching nodesfrom which it can download the file
As compared with purely distributed networks like Gnutella, Kazaaachieves much lower search time
In contrast to Napster, supernodes do not form a single point offailure
P2P Distributed Computing 28/91 28 / 91
Unstructured P2P networks Other Approach: BitTorrent
BitTorrent
BitTorrent is a P2P system that does not belong to any of thementioned categories
Uses a central location to coordinate data upload and downloadamong peers
To share a file f , a peer first creates a small torrent file, whichcontains metadata about f , e.g., its length, name and hashinginformation
BitTorrent cuts a file into pieces of fixed size, typically between 64KB and 4 MB each
Each piece has a checksum from the SHA1 hashing algorithm, whichis also recorded in the torrent file
P2P Distributed Computing 29/91 29 / 91
Unstructured P2P networks Other Approach: BitTorrent
BitTorrent
The torrent file contains the URL of a tracker, which keeps track ofall the peers who have file f (either partially or completely)
A peer that wants to download the file, first obtains thecorresponding torrent file
Then, it connects to the specified tracker
The tracker responds with a random list of peers which aredownloading the same file
The requesting peer then connects to these peers for downloading
Torrent files are often published on large websites
Centralization of tracker is an issue
P2P Distributed Computing 30/91 30 / 91
Unstructured P2P networks Other Approach: BitTorrent
BitTorrent
The latest BitTorrent clients implement a decentralized trackingmechanism (e.g., µTorrent)
In this mechanism, every peer acts as a mini-tracker
Peers first join a DHT network, which is inherently implemented inthe BitTorrent client
A torrent is then stored at a certain peer according to the DHTstorage method
All peers in the DHT network can search for the torrent through DHTsearch
Eliminates central trackers from the system
P2P Distributed Computing 31/91 31 / 91
Unstructured P2P networks Other Approach: BitTorrent
BitTorrent: Tit-for-Tat and Chunk Selection
BitTorrent uses tit-for-tat policy
A peer serves peers that serve it
Encourages cooperation, discourage free-riding
Use rarest first policy when downloading chunks to maximizeavailability
For the first chunk, randomly picks any, so that peer has somethingto share
P2P Distributed Computing 32/91 32 / 91
Structured P2P networks
Outline
1 Peer-to-Peer (P2P) networks
2 Unstructured P2P networks
3 Structured P2P networks
4 Application Layer Multicast (ALM)
P2P Distributed Computing 33/91 33 / 91
Structured P2P networks
Structured overlay
A network overlay that connects nodes using a particular datastructure or protocol to ensure that node lookup or data discovery isdeterministic
The P2P overlay network topology is tightly controlled and contentsare placed not at random peers but at specified locations that willmake subsequent queries more efficient
Addressing instead of searching
Network structure determines where peers belong in the network andwhere objects are stored
P2P Distributed Computing 34/91 34 / 91
Structured P2P networks
Distributed Hash Tables
Hash function:
hash(x) = x mod 10
Insert numbers 0, 1, 4, 9, 16, and 25
Easy to find if a given key is present inthe table
P2P Distributed Computing 35/91 35 / 91
Structured P2P networks
Distributed Hash Tables
Hash tables are fast for lookups
Idea: Distribute hash buckets topeers
Result is Distributed Hash Table(DHT)
Need efficient mechanism forfinding which peer is responsiblefor which bucket and routingbetween them
P2P Distributed Computing 36/91 36 / 91
Structured P2P networks
Distributed Hash Tables
In a DHT, each node isresponsible for one or more hashbuckets
As nodes join and leave, theresponsibilities change
Nodes communicate amongthemselves to find theresponsible node
Scalable communicationsmake DHTs efficient
DHTs support all the normalhash table operations
P2P Distributed Computing 37/91 37 / 91
Structured P2P networks
DHT summary
Hash buckets distributed over nodes
Nodes form an overlay network
Route messages in overlay to find responsible node
Routing scheme in the overlay network is the difference betweendifferent DHTs
DHT behavior and usage:Node knows object name and wants to find it
Unique and known object names assumed
Node routes a message in overlay to the responsible nodeResponsible node replies with object
Semantics of object are application defined
P2P Distributed Computing 38/91 38 / 91
Structured P2P networks DHT case studies
DHT examples
Chord
Pastry
Tapestry
CAN
Kademlia
P2P Distributed Computing 39/91 39 / 91
Structured P2P networks DHT case studies
Chord: Introduction
Chord uses SHA-1 hash function
Results in a 160-bit object/node identifierSame hash function for objects and nodes
Node ID hashed from IP address
Object ID hashed from object nameObject names somehow assumed to be known by everyone
SHA-1 gives a 160-bit identifier space
Organized in a ring which wraps around
Overlay is often called Chord ring or Chord circleNodes keep track of predecessor and successorNode registers objects on the namespace between predecessor and itself
P2P Distributed Computing 40/91 40 / 91
Structured P2P networks DHT case studies
Chord: Joining
A 3-bit identifier space
Existing network withnodes on 0, 1 and 4
New node wants to join
P2P Distributed Computing 41/91 41 / 91
Structured P2P networks DHT case studies
Chord: Joining
New node wants to join
Hash of the new node: 6
Known node in network:Node1
Contact Node1(providing own hash)
P2P Distributed Computing 42/91 42 / 91
Structured P2P networks DHT case studies
Chord: Joining
Messages are representedthrough arrows
New node6 contactsnode1
P2P Distributed Computing 43/91 43 / 91
Structured P2P networks DHT case studies
Chord: Joining
Node1 forwards themessage to its successor(node4)
P2P Distributed Computing 44/91 44 / 91
Structured P2P networks DHT case studies
Chord: Joining
Node4 forwards themessage to its successor(node0)
P2P Distributed Computing 45/91 45 / 91
Structured P2P networks DHT case studies
Chord: Joining
Joining is successful
Old responsible nodetransfers data thatshould be in new node
New node informs Node4about new successor
P2P Distributed Computing 46/91 46 / 91
Structured P2P networks DHT case studies
Chord: Joining
After joining andtransferring the data
P2P Distributed Computing 47/91 47 / 91
Structured P2P networks DHT case studies
Chord: Storing a value
Node 6 wants to storeobject with name ”Foo”and value 5
hash(Foo) = 2
P2P Distributed Computing 48/91 48 / 91
Structured P2P networks DHT case studies
Chord: Storing a value
P2P Distributed Computing 49/91 49 / 91
Structured P2P networks DHT case studies
Chord: Storing a value
P2P Distributed Computing 50/91 50 / 91
Structured P2P networks DHT case studies
Chord: Storing a value
P2P Distributed Computing 51/91 51 / 91
Structured P2P networks DHT case studies
Chord: Retrieving a value
Node 1 wants to getobject with name ”Foo”
hash(Foo) = 2
Foo is stored onnode4
P2P Distributed Computing 52/91 52 / 91
Structured P2P networks DHT case studies
Chord: Retrieving a value
P2P Distributed Computing 53/91 53 / 91
Structured P2P networks DHT case studies
Chord: Retrieving a value
P2P Distributed Computing 54/91 54 / 91
Structured P2P networks DHT case studies
Chord: Scalable routing
Routing happens by passing message to successor
What happens when there are 1 million nodes?
On average, need to route 1/2-way across the ringIn other words, 0.5 million hops! Complexity O(n)
How to make routing scalable?
Answer: Finger tables
Basic Chord keeps track of predecessor and successor
Finger tables keep track of more nodes
Allow for faster routing by jumping long way across the ringRouting scales well, but need more state information
Finger tables not needed for correctness, only performanceimprovement
P2P Distributed Computing 55/91 55 / 91
Structured P2P networks DHT case studies
Chord: Finger tables
In m-bit identifier space, node has up to m fingers
Fingers are stored in the finger table
Row i in finger table at node v contains first node s that succeeds vby at least 2i−1 on the ring (namespace, not nodes!)
In other words:
finger [i ] = u : |u| >= |v | + 2i−1 mod 2m
Distance to finger [i ] is at least 2i−1
P2P Distributed Computing 56/91 56 / 91
Structured P2P networks DHT case studies
Chord: Scalable routing
Finger intervals increase withdistance from node n
Each node only storesinformation about a smallnumber of nodes
Example has three nodes at 0,1, and 3
3-bit ID space → 3 rows offingers
P2P Distributed Computing 57/91 57 / 91
Structured P2P networks DHT case studies
Chord: Finger table example
finger[1].interval =[finger[1].start,finger[2].start)Row entry: (n + 2r ) mod 2m [n=node id, r=row number starting from 0]
P2P Distributed Computing 58/91 58 / 91
Structured P2P networks DHT case studies
Chord: Performance
Search performance of “pure” Chord O(n)
Number of nodes is n
With finger tables, need O(log n) hops to find the correct node
Fingers separated by at least 2i−1
With high probability, distance to target halves at each step
For state information, “pure” Chord has only successor andpredecessor, O(1) state
For finger tables, need m entries
P2P Distributed Computing 59/91 59 / 91
Structured P2P networks DHT case studies
Pastry: Introduction
Pastry is a DHT-based P2P object location and routing substrate
Pastry assigns a 128 bit unique identifier to each node that joins thePastry network
The identifier is chosen by hashing the node’s IP address to create aNodeId
Keys are also selected from the same identifier space
The root node for a key is the node who’s NodeId is closest to thekey among all live nodes
P2P Distributed Computing 60/91 60 / 91
Structured P2P networks DHT case studies
Pastry: Routing
Each Pastry node maintains a routing state which consists of a leafset, a routing table and a neighbor set
The leaf set is the set of nodes with L/2 numerically closest largernodeIds and L/2 numerically closest smaller nodeIds
The routing table is a matrix with a/b rows and 2b columns
a is the number of bits in the nodeId and b is a parameter with atypical value 4
The entry in row r and column c of the routing table contains anodeId that shares the first r digits with the local node’s nodeId, andhas the (r + 1)th digit equal to c
If there is no such nodeId, the entry is left empty
P2P Distributed Computing 61/91 61 / 91
Structured P2P networks DHT case studies
Pastry: Routing
a = 8, b = 2 → L = 2b = 4 and M = 2b = 4
P2P Distributed Computing 62/91 62 / 91
Structured P2P networks DHT case studies
Pastry: Node joining
Newly joining node X knows a pastry node A
X asks A to route a “join” message with key = NodeId(X)
Message targets Z, whose NodeId is numerically closest to NodeId(X)
All nodes along the path A, B, , Z send state tables to X
X initializes its state using this information
X sends its state to concerned nodes
P2P Distributed Computing 63/91 63 / 91
Application Layer Multicast (ALM)
Outline
1 Peer-to-Peer (P2P) networks
2 Unstructured P2P networks
3 Structured P2P networks
4 Application Layer Multicast (ALM)
P2P Distributed Computing 64/91 64 / 91
Application Layer Multicast (ALM) P2P Video streaming
Video traffic trends
Due to the availability of broadband technologies and more powerfulpersonal computers, video traffic over the Internet has enormouslyincreased
P2P Distributed Computing 65/91 65 / 91
Application Layer Multicast (ALM) P2P Video streaming
Video streaming
Streaming is the transfer of media such as audio or video over a networkas a steady and continuous stream [HB05]
Video-on-Demand (VoD)
Receiver-driven
Users can request for any video, any time
Extended buffering
Live video streaming
Source-driven
Broadcasts the newly generated content
Limited buffering
P2P Distributed Computing 66/91 66 / 91
Application Layer Multicast (ALM) P2P Video streaming
Video streaming architectures
With router support
IP multicastScalability, security and lackof a business modelTelco-managed IPTV uses IPmulticastLimited to private networksand expensive
Without router supportWithout end-systems support (Centralized C/S, CDN)
Scalability and cost (Servers and upload bandwidth)
With end-systems support (P2P)
P2P Distributed Computing 67/91 67 / 91
Application Layer Multicast (ALM) P2P Video streaming
P2P live video streaming
End-hosts (peers) form aself-organizing network
Peers share their uploadbandwidth through relayingcontent to each other
Uses the existing IP infrastructure
Easy to deploy with low cost
Programmable end-hosts
Potentially scalable
P2P Distributed Computing 68/91 68 / 91
Application Layer Multicast (ALM) P2P Video streaming
Group managment methods
Isolated-channel systems
Build a separate overlay for each channel
Channel switch is departure from the overlay
Overlay highly dynamic, low peer population in less popular channels
Cross-channel systems
Allow peers watching different channels to exist in the same overlay
Peers also forward the streams they do not actually watch
Channel switch and low peer pooulation in less popular channels iscontolled
Management complexity and load
P2P Distributed Computing 69/91 69 / 91
Application Layer Multicast (ALM) P2P Video streaming
P2P live video delivery methods
P2P Distributed Computing 70/91 70 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Single tree approach
P2P Distributed Computing 71/91 71 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Single tree approach
The simplest form of push-based protocols that build a separate treefor each group of users
The root node sends the stream to its children and the processcontinues up to the leaf nodes
Efficient in terms of timely delivery of the video content
P2P Distributed Computing 72/91 72 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Single tree approach: challenges
A low capacity peer, if placed at a high level in the tree, will impactthe QoS for the downstream users;
Loops must be avoided during the tree construction
Churn greatly impacts the performance of single tree systems, sincean abrupt departure or failure of a node disrupts the streamavailability to all its offspring peers;
A large number of leaf nodes cannot share their upload bandwidth,because they have no child peers
Example: Scribe, Narada
P2P Distributed Computing 73/91 73 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Scribe
An application-level multicast system that is built upon Pastry
To create a Scribe group, a random Pastry key known as ScribegroupId is chosen
Pastry routes a message for creating a multicast group towards thenode having a nodeId numerically closest to the groupId
That node becomes the root of the multicast group
Groups are managed in a fully decentralized way
Interested nodes become members of the group by routing a JOINmessage towards the root
The intermediate nodes that receive the JOIN message, add therequesting node as a child for the group and if currently, it is not themember of the group itself, it sends further a JOIN request towardsthe root
P2P Distributed Computing 74/91 74 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Scribe
Multicast messages are disseminated from the root towards the leafnodes forming a tree structure
The routing properties of Pastry ensure that the tree is loop-free
The children receive the implicit heartbeat messages from the parentswith the multicast data while the children send explicit refreshmessages to the parents to indicate their presence
P2P Distributed Computing 75/91 75 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Scribe
P2P Distributed Computing 76/91 76 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Multi-tree approach
P2P Distributed Computing 77/91 77 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Multi-tree approach
Forms several trees to disseminate a video stream
The source node splits the video stream into sub-streams and diffuseseach of them onto a separate tree
A video coding mechanism such as Multiple Description Coding(MDC) or Layered Coding (LC)
Example: SplitSteram
P2P Distributed Computing 78/91 78 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Multi-tree approach
One node can join all the trees to receive a full quality video or lessnumber of trees according to its capacity
A leaf node in one tree becomes the forwarding node in another tree
A node which is not a leaf node in at least one tree will share itsupload bandwidth
A low capacity node can contribute up to its potential
The failure or abrupt departure of a node only disrupts the availabilityof the substream it is forwarding to its descendants
P2P Distributed Computing 79/91 79 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Multi-tree approach: challenges
Peers dynamaics still an issue: the disruption of a substream impactsthe streaming quality
Maintaining multiple trees and the use of coding schemes put anextra overhead on the system
P2P Distributed Computing 80/91 80 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
SplitStream
Builds multiple trees for one multicast group by using Scribe
Trees are interior-node-disjoint: a set of trees, in which each node isan interior node in at most one tree and a leaf node in all other trees
To do so, SplitStream creates trees in such a way that the identifierfor each tree, called the stripId, differs in the most significant digit
The source node divides the content into smaller parts called stripsand sends them on different trees
Peers interested in particular strips join the trees corresponding tothose strips
The failure or departure of an upstream peer causes the loss of onlyone strip for its downstream peers
P2P Distributed Computing 81/91 81 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Pull-based approach
P2P Distributed Computing 82/91 82 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Pull-based approach
In contrast to push-based systems the content delivery is controlledby receiver peers
Video stream is divided into smaller pieces, called chunks, and a peerpulls them from multiple neighbors at the same time
Each node in a pull-based system maintains a set of partners
Each node periodically exchanges its chunks availability informationwith its partners
Based on this information, a node decides to download the contentfrom one or more partners through an explicit request
Example: DoNet/CoolSteaming
P2P Distributed Computing 83/91 83 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Pull-based approach
Pros
Provide more resilience against peers’ dynamics because one peerreceives the content from multiple other peers at the same time
Each peer gets more chance to utilize its upload bandwidth byforwarding the content to other peers
Cons
The advertisement of chunks availability information, explicit requestfrom the receivers for data and packets delivery involves three roundsof communication for a group of packets to be delivered
Incurs delays and increases the communication overhead
Before advertising the availability of packets, a peer waits until anumber of packets are buffered, which causes further delays
P2P Distributed Computing 84/91 84 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
DONet/CoolStreaming
A data-driven approach which means that the availability of datadecides its flow rather than a fixed structure
Each DONet node maintains a partner list to keep a partial view ofthe network and a member list to exchange data
To manage these lists and exchange data, each node has three keymodules: membership manager, partnership manager and thescheduler
The membership manager enables nodes to maintain a partial view ofother members
The partnership manager allows to establish and maintain thepartnership with other nodes for data exchange
The scheduler schedules video segments to be fetched from thepartners
P2P Distributed Computing 85/91 85 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
DONet/CoolStreaming
A newly joining node, first contacts the origin node, and the originnode chooses a deputy node from its cache
The deputy node provides the list of member nodes
The new node can select its partners from this list
In a case of graceful departure, the departing node will inform itspartners of its departure
In a case of failure the partner node will detect the failure due to theabsence of buffer map message and will issue a departure message onbehalf of the failed node
P2P Distributed Computing 86/91 86 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
DONet/CoolStreaming
DONet divides the video stream into segments and maintains abuffer, which can hold 120 segments
Each node continuously exchange its buffer’s information indicatingthe availability of segments with its partner nodes
This information is encoded within 120 bits called Buffer Map (BM),where a 1 shows the availability and a 0 shows the unavailability ofsegments
The scheduling algorithm schedules the segments retrieval from thepartner nodes based on the buffer map, bandwidth and response timeof the partners, and playback deadlines of the local node
P2P Distributed Computing 87/91 87 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Hybrid approach
Attempt to combine the resilience of pull-based and efficiency ofpush-based protocols
Each peer operates in both pull and push modes
P2P Distributed Computing 88/91 88 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
Hybrid approach
Generally, a peer use pull mode for an initial period of time afterjoining and then subscribes to a particular peer that pushes thestream to it
During push operation, pull mode is also used to retrieve missingpackets
The push mode in these systems attempts to ensure the timelydelivery of the content while the pull mode provides resilience againstchurn
The challenge here is the choice of the node from whom to receivethe packets through push mode
A stable node, with a good upload contribution ?
Example: mTreebone, GridMedia
P2P Distributed Computing 89/91 89 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
GridMedia
GridMedia is a P2P video streaming system incorporating the hybridpush/pull strategy for content delivery
Each GridMedia node operates in pull mode at startup
Afterwards, a node subscribes to pushing packets from a providernode for a specified interval of time
The choice of the provider node that will push packets is based on thetraffic received from a node in the previous interval
The probability that a node will be chosen as a provider through pushmethod is equal to the percentage of traffic received from that node
Meanwhile, the lost packets are pulled from the neighbors
P2P Distributed Computing 90/91 90 / 91
Application Layer Multicast (ALM) P2P stream delivery methods
GridMedia
To avoid the transmission of duplicate packets through push and pullmodes, both receiver and sender use the following mechanism
The receiver records the received packet with maximum sequencenumber and if it detects that a packet, whose sequence number lagsbehind from the recorded maximum sequence number by a specifiedthreshold, it pulls it from other neighbors
The sender keeps track of the packet with maximum sequencenumber pushed to a neighbor
If the sender itself receives a packet whose sequence number lagsbehind the recorded maximum sequence number of the pushed packetby some specified threshold, it does not push it to the receiver
The specified threshold at the receiver is kept greater than or equal tothe threshold specified at the sender to control duplication
P2P Distributed Computing 91/91 91 / 91
References
Stephanos Androutsellis-Theotokis and Diomidis Spinellis.
A survey of peer-to-peer content distribution technologies.ACM Comput. Surv., 36(4):335–371, 2004.
Markus Hofmann and Leland R. Beaumont.
Content Networking: Architecture, Protocols, and Practice.Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005.
R. Schollmeier.
A definition of peer-to-peer networking for the classification of peer-to-peer architectures and applications.In Proceedings of the First International Conference on Peer-to-Peer Computing, P2P ’01, pages 101–102, Washington,DC, USA, 2001. IEEE Computer Society.
P2P Distributed Computing 91/91 91 / 91