peer-to-peer systems and distributed hash tables (dhts) cos 461: computer networks spring 2009 (mw...
TRANSCRIPT
![Page 1: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/1.jpg)
Peer-to-peer systems and Distributed Hash Tables (DHTs)
COS 461: Computer NetworksSpring 2009 (MW 1:30-2:50 in COS 105)
Mike FreedmanTeaching Assistants: Wyatt Lloyd and Jeff Terrace
http://www.cs.princeton.edu/courses/archive/spring09/cos461/
![Page 2: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/2.jpg)
Overlay Networks• P2P applications need to:
– Track identities & (IP) addresses of peers• May be many and may have significant churn • Best not to have n2 ID references
– Thus, nodes’ “views” << view in consistent hashing– Route messages among peers
• If you don’t keep track of all peers, this is “multi-hop”
• Overlay network– Peers doing both naming and routing– IP becomes “just” the low-level transport
• All the IP routing is opaque• Assumption that network is fully-connected ( true?)
(Many slides borrowed from Joe Hellerstein’s VLDB keynote)
![Page 3: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/3.jpg)
Many New Challenges
• Relative to other parallel/distributed systems– Partial failure– Churn– Few guarantees on transport, storage, etc.– Huge optimization space– Network bottlenecks & other resource constraints– No administrative organizations– Trust issues: security, privacy, incentives
• Relative to IP networking– Much higher function, more flexible– Much less controllable/predictable
![Page 4: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/4.jpg)
Early P2P
![Page 5: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/5.jpg)
Early P2P I: Client-Server
• Napster
xyz.mp3 ?
xyz.mp3
![Page 6: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/6.jpg)
Early P2P I: Client-Server
• Napster– Client-search search
xyz.mp3
![Page 7: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/7.jpg)
Early P2P I: Client-Server
• Napster– Client-search search
xyz.mp3 ?
xyz.mp3
![Page 8: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/8.jpg)
Early P2P I: Client-Server
• Napster– Client-search search– “P2P” file xfer
xyz.mp3 ?
xyz.mp3
![Page 9: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/9.jpg)
Early P2P I: Client-Server
• Napster– Client-search search– “P2P” file xfer
xyz.mp3 ?
xyz.mp3
![Page 10: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/10.jpg)
Early P2P II: Flooding on Overlays
xyz.mp3 ?
xyz.mp3
An “unstructured” overlay network
![Page 11: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/11.jpg)
Early P2P II: Flooding on Overlays
xyz.mp3 ?
xyz.mp3
Flooding
![Page 12: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/12.jpg)
Early P2P II: Flooding on Overlays
xyz.mp3 ?
xyz.mp3
Flooding
![Page 13: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/13.jpg)
Early P2P II: Flooding on Overlays
xyz.mp3
![Page 14: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/14.jpg)
Early P2P II.v: “Ultra/super peers”• Ultra-peers can be installed (KaZaA) or self-promoted
(Gnutella)– Also useful for NAT circumvention, e.g., in Skype
![Page 15: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/15.jpg)
Hierarchical Networks (& Queries)• IP
– Hierarchical name space (www.vldb.org, 141.12.12.51)– Hierarchical routing: AS’s corr. with name space (not perfectly)
• DNS– Hierarchical name space (“clients” + hierarchy of servers)– Hierarchical routing w/aggressive caching
• Traditional pros/cons of hierarchical mgmt– Works well for things aligned with the hierarchy
• E.g., physical or administrative locality– Inflexible
• No data independence!
![Page 16: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/16.jpg)
Lessons and Limitations
• Client-Server performs well– But not always feasible: Performance not often key issue!
• Things that flood-based systems do well– Organic scaling– Decentralization of visibility and liability– Finding popular stuff– Fancy local queries
• Things that flood-based systems do poorly– Finding unpopular stuff– Fancy distributed queries– Vulnerabilities: data poisoning, tracking, etc.– Guarantees about anything (answer quality, privacy, etc.)
![Page 17: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/17.jpg)
Structured Overlays:Distributed Hash Tables
![Page 18: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/18.jpg)
DHT Outline• High-level overview• Fundamentals of structured network topologies
– And examples
• One concrete DHT– Chord
• Some systems issues– Heterogeneity– Storage models & soft state– Locality– Churn management– Underlay network issues
![Page 19: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/19.jpg)
High-Level Idea: Indirection• Indirection in space
– Logical (content-based) IDs, routing to those IDs• “Content-addressable” network
– Tolerant of churn • nodes joining and leaving the network
to h
y
z
h=y
![Page 20: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/20.jpg)
High-Level Idea: Indirection• Indirection in space
– Logical (content-based) IDs, routing to those IDs• “Content-addressable” network
– Tolerant of churn • nodes joining and leaving the network
• Indirection in time– Temporally decouple send and receive– Persistence required. Hence, typical sol’n: soft state
• Combo of persistence via storage and via retry– “Publisher” requests TTL on storage– Republishes as needed
• Metaphor: Distributed Hash Table
to hz
h=z
![Page 21: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/21.jpg)
What is a DHT?• Hash Table
– Data structure that maps “keys” to “values”– Essential building block in software systems
• Distributed Hash Table (DHT) – Similar, but spread across the Internet
• Interface – insert (key, value) or put (key, value)– lookup (key) or get (key)
![Page 22: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/22.jpg)
How?
Every DHT node supports a single operation:
– Given key as input; route messages toward node holding key
![Page 23: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/23.jpg)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
![Page 24: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/24.jpg)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
![Page 25: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/25.jpg)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action
Operation: take key as input; route msgs to node holding key
![Page 26: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/26.jpg)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
put(K1,V1)
Operation: take key as input; route msgs to node holding key
![Page 27: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/27.jpg)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
put(K1,V1)
Operation: take key as input; route msgs to node holding key
![Page 28: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/28.jpg)
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put()
Operation: take key as input; route msgs to node holding key
![Page 29: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/29.jpg)
get (K1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: get()
Operation: take key as input; route msgs to node holding key
![Page 30: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/30.jpg)
get (K1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Iterative vs. Recursive Routing Previously showed recursive.Another option: iterative
Operation: take key as input; route msgs to node holding key
![Page 31: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/31.jpg)
DHT Design Goals• An “overlay” network with:
– Flexible mapping of keys to physical nodes– Small network diameter – Small degree (fanout)– Local routing decisions– Robustness to churn– Routing flexibility – Decent locality (low “stretch”)
• Different “storage” mechanisms considered:– Persistence w/ additional mechanisms for fault recovery– Best effort caching and maintenance via soft state
![Page 32: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/32.jpg)
DHT Outline• High-level overview• Fundamentals of structured network topologies
– And examples
• One concrete DHT– Chord
• Some systems issues– Heterogeneity– Storage models & soft state– Locality– Churn management– Underlay network issues
![Page 33: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/33.jpg)
An Example DHT: Chord
• Assume n = 2m nodes for a moment– A “complete” Chord ring– We’ll generalize shortly
![Page 34: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/34.jpg)
An Example DHT: Chord
• Each node has particular view of network– Set of known neighbors
![Page 35: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/35.jpg)
An Example DHT: Chord
• Each node has particular view of network– Set of known neighbors
![Page 36: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/36.jpg)
An Example DHT: Chord
• Each node has particular view of network– Set of known neighbors
![Page 37: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/37.jpg)
Cayley Graphs• The Cayley Graph (S, E) of a group:
– Vertices corresponding to the underlying set S– Edges corresponding to the actions of the generators
• (Complete) Chord is a Cayley graph for (Zn,+) – S = Z mod n (n = 2k).– Generators {1, 2, 4, …, 2k-1}– That’s what the polygons are all about!
• Fact: Most (complete) DHTs are Cayley graphs– And they didn’t even know it!– Follows from parallel InterConnect Networks (ICNs)
![Page 38: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/38.jpg)
How Hairy met Cayley• What do you want in a structured network?
– Uniformity of routing logic– Efficiency/load-balance of routing and maintenance– Generality at different scales
• Theorem: All Cayley graphs are vertex symmetric.– I.e. isomorphic under swaps of nodes– So routing from y to x looks just like routing from (y-x) to 0
• The routing code at each node is the same• Moreover, under a random workload the routing responsibilities
(congestion) at each node are the same!• Cayley graphs tend to have good degree/diameter tradeoffs
– Efficient routing with few neighbors to maintain• Many Cayley graphs are hierarchical
– Made of smaller Cayley graphs connected by a new generator• E.g. a Chord graph on 2m+1 nodes looks like 2 interleaved (half-notch
rotated) Chord graphs of 2m nodes with half-notch edges
![Page 39: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/39.jpg)
Pastry/Bamboo• Based on Plaxton Mesh• Names are fixed bit strings• Topology: Prefix Hypercube
– For each bit from left to right, pick neighbor ID with common flipped bit and common prefix
– log n degree & diameter• Plus a ring
– For reliability (with k pred/succ)• Suffix Routing from A to B
– “Fix” bits from left to right– E.g. 1010 to 0001:
1010
0101
1100 1000
1011
1010 → 0101 → 0010 → 0000 → 0001
![Page 40: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/40.jpg)
CAN: Content Addressable Network
Exploit multiple dimensions Each node is assigned a zone
(0,0) (1,0)
(0,1)
(0,0.5, 0.5, 1)
(0.5,0.5, 1, 1)
(0,0, 0.5, 0.5)
• •
••(0.5,0.25, 0.75, 0.5)
(0.75,0, 1, 0.5)•
Nodes ID’d by zone boundaries Join: chose random point,
split its zones
![Page 41: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/41.jpg)
Routing in 2-dimensions
• Routing is navigating a d-dimensional ID space– Route to closest neighbor in direction of destination– Routing table contains O(d) neighbors
• Number of hops is O(dN1/d)
(0,0) (1,0)
(0,1)
(0,0.5, 0.5, 1)
(0.5,0.5, 1, 1)
(0,0, 0.5, 0.5) (0.75,0, 1, 0.5)
•
•
•
••(0.5,0.25, 0.75, 0.5)
![Page 42: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/42.jpg)
Koorde
• DeBruijn graphs– Link from node x to nodes
2x and 2x+1– Degree 2, diameter log n
• Optimal!
• Koorde is Chord-based– Basically Chord, but with
DeBruijn fingers
![Page 43: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/43.jpg)
Topologies of Other Oft-cited DHTs• Tapestry
– Very similar to Pastry/Bamboo topology– No ring
• Kademlia– Also similar to Pastry/Bamboo– But the “ring” is ordered by the XOR metric: “bidirectional”– Used by the eMule / BitTorrent / Azureus (Vuze) systems
• Viceroy– An emulated Butterfly network
• Symphony– A randomized “small-world” network
![Page 44: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/44.jpg)
Incomplete Graphs: Emulation• For Chord, we assumed exactly 2m
nodes. What if not?– Need to “emulate” a complete graph
even when incomplete.
• DHT-specific schemes used– In Chord, node x is responsible for the
range [x, succ(x) )– The “holes” on the ring should be
randomly distributed due to hashing
![Page 45: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/45.jpg)
Handle node heterogeneity• Sources of unbalanced load
– Unequal portion of keyspace– Unequal load per key
• Balancing keyspace– Consistent hashing: Region owned
by single node is O(1/n (1 + log n))– What about node hetergeneity?
• Nodes create “virtual nodes” of # proportional to capacity
• Load per key– Assumes many keys per region
![Page 46: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/46.jpg)
Chord in Flux
• Essentially never a “complete” graph– Maintain a “ring” of
successor nodes– For redundancy, point to k
successors – Point to nodes responsible
for IDs at powers of 2• Called “fingers” in Chord• 1st finger is the successor
![Page 47: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/47.jpg)
Joining the Chord Ring• Need IP of some node• Pick a random ID
– e.g. SHA-1(IP)• Send msg to current
owner of that ID – That’s your
predecessor in Chord routing
![Page 48: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/48.jpg)
Joining the Chord Ring• Update pred/succ links
– Once ring is in place, all well!• Inform application to move
data appropriately• Search to find “fingers” of
varying powers of 2– Or just copy from pred /succ
and check!• Inbound fingers fixed lazily
Theorem: If consistency is reached before network doubles, lookups remain log n
Theorem: If consistency is reached before network doubles, lookups remain log n
![Page 49: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/49.jpg)
Fingers must be constrained?
• No: Proximity Neighbor Selection (PNS)
![Page 50: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/50.jpg)
Handling Churn• Churn
– Session time? Life time? • For system resilience, session time is what matters
• Three main issues– Determining timeouts
• Significant component of lookup latency under churn– Recovering from a lost neighbor in “leaf set”
• Periodic, not reactive!• Reactive causes feedback cycles
– Esp. when a neighbor is stressed and timing in and out
– Neighbor selection again
![Page 51: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/51.jpg)
Timeouts• Recall Iterative vs. Recursive Routing
– Iterative: Originator requests IP address of each hop– Recursive: Message transferred hop-by-hop
• Effect on timeout mechanism– Need to track latency of communication channels– Iterative results in direct nn communication
• Can’t keep timeout stats at that scale• Solution: virtual coordinate schemes [Vivaldi, etc.]
– With recursive can do TCP-like tracking of latency• Exponentially weighted mean and variance
• Upshot: Both work OK up to a point– TCP-style does somewhat better than virtual coords at
modest churn rates (23 min. or more mean session time)– Virtual coords begins to fail at higher churn rates
![Page 52: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/52.jpg)
Recursive vs. Iterative
Left: Simulation of 20,000 lkps for random keys Recursive lookup takes 0.6 times as long as iterative
Right Figure: 1,000 lookups in test-bed; confirms simulation
![Page 53: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/53.jpg)
Recursive vs. Iterative• Recursive
– Faster under many conditions• Fewer round-trip-times• Better proximity neighbor selection• Can timeout individual RPCs more tightly
– Better tolerance to network failure• Path between neighbors already known
• Iterative– Tighter control over entire lookup
• Easily support windowed RPCs for parallelism• Easier to timeout entire lookup as failed
– Faster to return data directly than use recursive path
![Page 54: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/54.jpg)
Storage Models for DHTs
• Up to now we focused on routing– DHTs as “content-addressable network”
• Implicit in “DHT” name is some kind of storage– Or perhaps a better word is “memory”– Enables indirection in time– But also can be viewed as a place to store things
![Page 55: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/55.jpg)
Storage models
• Store only on key’s immediate successor– Churn, routing issues, packet loss make lookup
failure more likely
• Store on k successors– When nodes detect succ/pred fail, re-replicate
• Cache along reverse lookup path– Provided data is immutable– …and performing recursive responses
![Page 56: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/56.jpg)
Storage on successors?• Erasure-coding
– Data block split into l fragments– m diff. fragments necessary to reconstruct the block– Redundant storage of data
• Replication– Node stores entire block– Special case: m = 1 and l is number of replicas– Redundant information spread over fewer nodes
• Comparison of both methods– r = l / m amount of redundancy
• Prob. block available:
€
pavail =l
i
⎛
⎝ ⎜
⎞
⎠ ⎟p0
i i − p0( )l−i
i= m
l
∑
![Page 57: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/57.jpg)
Latency: Erasure-coding vs. replication
Replication: slightly lower latency Erasure-coding: higher availability
DHash++ uses erasure-coding with m = 7 and l = 14
![Page 58: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/58.jpg)
What about mutable data?
• Ugh!
• Different views– Ivy: Create version trees [Muthitacharoen, OSDI ‘02]
• Think “distributed version control” system
• Global agreement?– Reach consensus among all nodes belonging to a
successor groups: “distributed agreement”• Difficult, especially at scale
![Page 59: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/59.jpg)
An oft overlooked assumption:The underlay isn’t perfect!
• All have implicit assumption: full connectivity
• Non-transitive connectivity (NTC) not uncommon
B ↔ C , C ↔ A , A ↔ B• A thinks C is its successor!
kA B C
X
![Page 60: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/60.jpg)
Does non-transitivity exist?• Gerding/Stribling PlanetLab study
– 9% of all node triples exhibit NTC– Attributed high extent to Internet-2
• Yet NTC is also transient– One 3 hour PlanetLab all-pair-pings trace– 2.9% have persistent NTC– 2.3% have intermittent NTC– 1.3% fail only for a single 15-minute snapshot
• Level3 ↔ Cogent, but Level3 ↔ X ↔ Cogent• NTC motivates RON and other overlay routing!
![Page 61: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/61.jpg)
NTC problem fundamental?
S → R A
A → R B
B → R R
RS CBA
Traditional routing
![Page 62: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/62.jpg)
NTC problem fundamental?
• DHTs implement greedy routing for scalability• Sender might not use path, even though exists: finds
local minima when id-distance routing
S → R A
A → R B
B → R R
RS CBA
Traditional routing
S → R A
A → R C
C → R X
Greedy routing
![Page 63: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/63.jpg)
Potential problems?
• Invisible nodes
• Routing loops
• Broken return paths
• Inconsistent roots
![Page 64: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/64.jpg)
Iterative routing: Invisible nodes
RA
S
C
X
B
Invisible nodes cause lookup to halt
k
![Page 65: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/65.jpg)
Iterative routing: Invisible nodes
RA
S
C
X
B
Invisible nodes cause lookup to halt Enable lookup to continue
Tighter timeouts via network coordinates Lookup RPCs in parallel Unreachable node cache
X D k
![Page 66: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/66.jpg)
Inconsistent roots
R
S
• Nodes do not agree where key is assigned: inconsistent views of root
– Can be caused by membership changes
– Also due to non-transitive connectivity: may persist!
R’?
S’
X
k
![Page 67: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/67.jpg)
Inconsistent roots
R
• Root replicates (key,value) among leaf set– Leafs periodically synchronize– Get gathers results from multiple leafs
• Not applicable when require fast update
R’MN
XS
k
![Page 68: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/68.jpg)
Longer term solution?
• Route around local minima when possible• Have nodes maintain link-state of neighbors
– Perform one-hop forwarding if necessary (think RON!)
S → R A
A → R B
B → R R
RS CBA
Traditional routing
S → R A
A → R C
C → R X
Greedy routing
![Page 69: Peer-to-peer systems and Distributed Hash Tables (DHTs) COS 461: Computer Networks Spring 2009 (MW 1:30-2:50 in COS 105) Mike Freedman Teaching Assistants:](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c765503460f9492b421/html5/thumbnails/69.jpg)
Summary• Peer-to-peer systems
– Unstructured systems• Finding hay, performing keyword search
– Structured systems (DHTs)• Finding needles, exact match
• Distributed hash tables– Based around consistent hashing with views of O(log n)– Chord, Pastry, CAN, Koorde, Kademlia, Tapestry, Viceroy, …
• Lots of systems issues– Heterogeneity, storage models, locality, churn management,
underlay issues, …– DHTs (Kademlia) deployed in wild: Vuze is 1M+ active users