cse 461 university of washington1 topic peer-to-peer content delivery – runs without dedicated...
Post on 24-Dec-2015
216 Views
Preview:
TRANSCRIPT
CSE 461 University of Washington 1
Topic• Peer-to-peer content delivery– Runs without dedicated infrastructure– BitTorrent as an example
Peer
Peer
Peer
PeerPeer
CSE 461 University of Washington 2
Context• Delivery with client/server CDNs:– Efficient, scales up for popular content– Reliable, managed for good service
• … but some disadvantages too:– Need for dedicated infrastructure– Centralized control/oversight
CSE 461 University of Washington 3
P2P (Peer-to-Peer)• Goal is delivery without dedicated
infrastructure or centralized control– Still efficient at scale, and reliable
• Key idea is to have participants (or peers) help themselves– Initially Napster ‘99 for music
(gone)– Now BitTorrent ‘01 onwards
(popular!)
CSE 461 University of Washington 4
P2P Challenges• No servers on which to rely– Communication must be peer-to-peer
and self-organizing, not client-server– Leads to several issues at scale …
Peer
Peer
Peer
PeerPeer
CSE 461 University of Washington 5
P2P Challenges (2)1. Limited capabilities– How can one peer deliver content
to all other peers?
2. Participation incentives– Why will peers help each other?
3. Decentralization– How will peers find content?
CSE 461 University of Washington 6
Overcoming Limited Capabilities• Peer can send content to all other
peers using a distribution tree– Typically done with replicas over time– Self-scaling capacity
Source
CSE 461 University of Washington 7
Overcoming Limited Capabilities (2)• Peer can send content to all other
peers using a distribution tree– Typically done with replicas over time– Self-scaling capacity
Source
CSE 461 University of Washington8
Providing Participation Incentives• Peer play two roles:– Download ( ) to help themselves,
and upload ( ) to help others
Source
CSE 461 University of Washington 9
Providing Participation Incentives (2)• Couple the two roles:– I’ll upload for you if you upload for me– Encourages cooperation
Source
CSE 461 University of Washington 10
Enabling Decentralization• Peer must learn where to get
content– Use DHTs (Distributed Hash Tables)
• DHTs are fully-decentralized, efficient algorithms for a distributed index– Index is spread across all peers– Index lists peers to contact for content– Any peer can lookup the index – Started as academic work in 2001
CSE 461 University of Washington 11
BitTorrent• Main P2P system in use today– Developed by Cohen in ‘01 – Very rapid growth, large transfers– Much of the Internet traffic today! – Used for legal and illegal content
• Delivers data using “torrents”:– Transfers files in pieces for
parallelism– Notable for treatment of incentives– Tracker or decentralized index (DHT)By Jacob Appelbaum, CC-BY-SA-2.0, from Wikimedia Commons
Bram Cohen (1975—)
CSE 461 University of Washington 12
BitTorrent Protocol• Steps to download a torrent:
1. Start with torrent description2. Contact tracker to join and get list
of peers (with at least seed peer)2. Or, use DHT index for peers3. Trade pieces with different peers4. Favor peers that upload to you
rapidly; “choke” peers that don’t by slowing your upload to them
CSE 461 University of Washington 13
BitTorrent Protocol (2)
• All peers (except seed) retrieve torrent at the same time
CSE 461 University of Washington 14
BitTorrent Protocol (3)
• Dividing file into pieces gives parallelism for speed
CSE 461 University of Washington 15
BitTorrent Protocol (4)
• Choking unhelpful peers encourages participation
STOPSTOP
STOP
XXX
CSE 461 University of Washington 16
BitTorrent Protocol (5)
• DHT index (spread over peers) is fully decentralized
DHT
DHT
DHTDHT
DHT
DHT
DHT
DHT
CSE 461 University of Washington 17
P2P Outlook
• Alternative to CDN-style client-server content distribution– With potential advantages
• P2P and DHT technologies finding more widespread use over time– E.g., part of skype, Amazon– Expect hybrid systems in the future
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert MorrisIon Stoica, David Karger,
M. Frans Kaashoek, Hari Balakrishnan
MIT and Berkeley
A peer-to-peer storage problem
• 1000 scattered music enthusiasts• Willing to store and serve replicas• How do you find the data?
The lookup problem
Internet
N1
N2 N3
N6N5
N4
Publisher
Key=“title”Value=MP3 data…
ClientLookup(“title”)
?
Centralized lookup (Napster)
Publisher@
Client
Lookup(“title”)
N6
N9 N7
DB
N8
N3
N2N1SetLoc(“title”, N4)
Simple, but O(N) state and a single point of failure
Key=“title”Value=MP3 data…
N4
Flooded queries (Gnutella)
N4Publisher@
Client
N6
N9
N7N8
N3
N2N1
Robust, but worst case O(N) messages per lookup
Key=“title”Value=MP3 data…
Lookup(“title”)
Routed queries (Freenet, Chord, etc.)
N4Publisher
Client
N6
N9
N7N8
N3
N2N1
Lookup(“title”)
Key=“title”Value=MP3 data…
Routing challenges
• Define a useful key nearness metric• Keep the hop count small• Keep the tables small• Stay robust despite rapid change
• Freenet: emphasizes anonymity• Chord: emphasizes efficiency and simplicity
Chord properties
• Efficient: O(log(N)) messages per lookup– N is the total number of servers
• Scalable: O(log(N)) state per node• Robust: survives massive failures
• Proofs are in paper / tech report– Assuming no malicious participants
Chord overview
• Provides peer-to-peer hash lookup:– Lookup(key) IP address– Chord does not store the data
• How does Chord route lookups?• How does Chord maintain routing tables?
Chord IDs
• Key identifier = SHA-1(key)• Node identifier = SHA-1(IP address)• Both are uniformly distributed• Both exist in the same ID space
• How to map key IDs to node IDs?
Consistent hashing [Karger 97]
N32
N90
N105
K80
K20
K5
Circular 7-bitID space
Key 5Node 105
A key is stored at its successor: node with next higher ID
Consistent hashing [Karger 97]Theorem: For any set of N nodes and K keys, with “high probability”
1) Each node is responsible for at most (1+ eps) K/N keys
2) When the (N+1)th node joins or leaves the network, responsibilityfor O(K/N) keys changes hands
Basic lookup
N32
N90
N105
N60
N10N120
K80
“Where is key 80?”
“N90 has K80”
Simple lookup algorithm
Lookup(my-id, key-id)n = my successorif my-id < n < key-id
call Lookup(id) on node n // next hop
elsereturn my successor // done
• Correctness depends only on successors
“Finger table” allows log(N)-time lookups
N80
½¼
1/8
1/161/321/641/128
Finger i points to successor of n+2i
N80
½¼
1/8
1/161/321/641/128
112
N120
Lookup with fingers
Lookup(my-id, key-id)look in local finger table for
highest node n s.t. my-id < n < key-idif n exists
call Lookup(id) on node n// next hop
elsereturn my successor // done
Lookups take O(log(N)) hops
N32
N10
N5
N20
N110
N99
N80
N60
Lookup(K19)
K19
Failures might cause incorrect lookupN120
N113
N102
N80
N85
N80 doesn’t know correct successor, so incorrect lookup
N10
Lookup(90)
Solution: successor lists
• Each node knows r immediate successors• After failure, will know first live successor• Correct successors guarantee correct lookups
• Guarantee is with some probability
Choosing the successor list length
• Assume 1/2 of nodes fail• P(successor list all dead) = (1/2)r – I.e. P(this node breaks the Chord ring)– Depends on independent failure
• P(no broken nodes) = (1 – (1/2)r)N
– r = 2log(N) makes prob. = 1 – 1/N
top related