cs 194: distributed systems dht applications: what and why
DESCRIPTION
CS 194: Distributed Systems DHT Applications: What and Why. Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776. Project Phase III. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/1.jpg)
1
CS 194: Distributed Systems DHT Applications: What and Why
Scott Shenker and Ion Stoica Computer Science Division
Department of Electrical Engineering and Computer SciencesUniversity of California, Berkeley
Berkeley, CA 94720-1776
![Page 2: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/2.jpg)
2
Project Phase III
What: Murali will discuss Phase III of the project
When: Tonight, 6:30pm
Where: 306 Soda
![Page 3: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/3.jpg)
3
Remaining Lecture Schedule
4/11 DHT applications (start) (Scott)
4/13 Web Services (Ion)
4/18 DHTapps+OpenDHT (Scott)
4/20 Jini (Ion)
4/25 Sensornets (Scott)
4/27 Robust Protocols (Scott)
5/2 Resource Allocation (Ion)
5/4 Game theory (Scott)
5/9 Review (both)
![Page 4: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/4.jpg)
4
Note about Special Topics
We won’t require additional reading
We will make clear what you need to know for the final
![Page 5: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/5.jpg)
5
Outline for Today’s Lecture
What is a DHT? (review)
Three classes of DHT applications (with examples):- rendezvous
- storage
- routing
Why DHTs?
DHTs and Internet Architecture?
![Page 6: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/6.jpg)
6
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: Peers
![Page 7: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/7.jpg)
7
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: Overlay
![Page 8: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/8.jpg)
8
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: put()
put(K1,V1)
![Page 9: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/9.jpg)
9
put(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: put()
![Page 10: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/10.jpg)
10
(K1,V1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: put()
![Page 11: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/11.jpg)
11
get(K1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: get()
![Page 12: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/12.jpg)
12
get(K1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
A DHT in Operation: get()
![Page 13: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/13.jpg)
13
Key Requirement
All puts and gets for a particular key must end up at the same machine
- Even in the presence of failures and new nodes (churn)
This depends on the DHT routing algorithm (last time)- Must be robust and scalable
![Page 14: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/14.jpg)
14
Two Important Distinctions
When talking about DHTs, must be clear whether you mean- Peers vs Infrastructure
- Library vs Service
![Page 15: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/15.jpg)
15
Peers or Infrastructure
Peer:
- Application users provide nodes for DHT
- Example: music sharing, cooperative web cache
- Easier to get, less well behaved
Infrastructure:
- Set of managed nodes provide DHT service
- Perhaps serve many applications
- Example: Planetlab
- Harder to get, but more reliable
![Page 16: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/16.jpg)
16
Library or Service
Library: DHT code bundled into application- Runs on each node running application
- Each application requires own routing infrastructure
- Allows customization of interface
- Very flexible, but much duplication
Service: single DHT shared by applications- Requires common infrastructure
- But eliminates duplicate routing systems
- Harder to get, and much less flexible, but easier on each individual app
![Page 17: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/17.jpg)
17
Not Covered Today
Making lookup scale under churn- Better routing algorithms
Manage data under churn- Efficient algorithms for creating and finding replicas
Network awareness- Taking advantage of proximity without relying on it
Developing proper analytic tools- Formalizing systems that are constantly in flux
![Page 18: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/18.jpg)
18
Not Covered Today (cont’d)
Dealing with adversaries- Robustness with untrusted participants
Maintaining data integrity- Cryptographic hashes and Merkle trees- Consistency
Privacy and anonymity
More general functionality- Indexing, queries, etc.
Load balancing and heterogeneity
![Page 19: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/19.jpg)
19
DHTs vs Unstructured P2P
DHTs good at:- exact match for “rare” items
DHTs bad at: - keyword search, etc. [can’t construct DHT-based Google]
- tolerating extreme churn
Gnutella etc. good at:- general search
- finding common objects
- very dynamic environments
Gnutella etc. bad at:- finding “rare” items
![Page 20: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/20.jpg)
20
Three Classes of DHT Applications
Rendezvous, Storage, and Routing
![Page 21: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/21.jpg)
21
Rendezvous Applications
Consider a pairwise application like telephony
If A wants to call B (using the Internet), A can do the following:
- A looks up B’s “phone number” (IP address of current machine)
- A’s phone client contacts B’s phone client
What is needed is a way to “look up” where to contact someone, based on a username or some other global identifier
![Page 22: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/22.jpg)
22
Using DHT for Rendezvous
Each person has a globally unique key (say 128 bits)- Can be hash of a unique name, or something else
Each client (telephony, chat, etc.) periodically stores the IP address (and other metadata) describing where they can be contacted
- This is stored using their unique key
When A wants to “call” B, it first does a get on B’s key
![Page 23: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/23.jpg)
23
Key Point
The key (or identifier) is globally unique and static
The DHT infrastructure is used to store the mapping between that static (persistent) identifier and the current location
- DHT functions as a dynamic and flat DNS
This can handle:- IP mobility- Chat- Internet telephony- DNS- The Web!
![Page 24: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/24.jpg)
24
Using DHTs for the Web
Oversimplified:
Name data with key
Store IP address of file server(s) holding data- replication trivial!
To get data, lookup key
If want CDN-like behavior, make sure IP address handed back is close to requester (several ways to do this)
![Page 25: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/25.jpg)
25
Three Classes of DHT Applications
Rendezvous, Storage, and Routing
![Page 26: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/26.jpg)
26
Storage Applications
Rendezvous applications use the DHT only to store small pointers (IP addresses, etc.)
What about using DHTs for more serious storage, such as file systems
![Page 27: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/27.jpg)
27
Examples of Storage Applications
File Systems Backup Archiving Electronic Mail Content Distribution Networks .....
![Page 28: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/28.jpg)
28
Why store data in a DHT?
High storage capacity: many disks
High serving capacity: many access links
High availability by replication
Simple application model
![Page 29: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/29.jpg)
29
Example: CFS (DHash over Chord)
Goal: serve a read-only file system
Publisher inserts file system into DHT
CFS client looks like an NFS file system:- /cfs/7ff23bda0092
CFS client fetches data from the DHT
![Page 30: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/30.jpg)
30
Root
Directory
File3Dir2File1
A “pointer”: Rootcontains DHT keyof Directory
Directory block containsfilename/blockID pairs
CFS Uses Tree of Blocks
![Page 31: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/31.jpg)
31
CFS Uses Self-authentication
Immutable block: (Content-Hash Block)
key = CryptographicHash(value)
encourages data sharing!
Mutable block: (Public-key Block)
key = Kpub
value = data + Sign[data]Kpriv
![Page 32: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/32.jpg)
32
Root
Directory
File3Dir2File1
Mutable block
Immutable blocks
• This is a single-writer mutable data structure
Most Blocks are Immutable
![Page 33: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/33.jpg)
33
Root
Directory
File3Dir2File1 File4
Directory v2
Mutable block
Immutableblocks
Adding a File to a Directory
![Page 34: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/34.jpg)
34
DHash replicates each key/value pair at the nodes after it on the circle
It’s easy to find replicas
Put(k,v) to all
Get(k) from closest
N32
N10N5
N110
N99
N80N60
N20K19
K19
N40 K19
Data Availability via Replication
![Page 35: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/35.jpg)
35
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block19
N68
Copy of19
First Live Successor Manages Replicas
![Page 36: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/36.jpg)
36
Usenet over a DHT
Bulletin board (started in 1981)- Has grown exponentially in volume
- 2004 volume is 1.4 Terabyte/day
Hosting full Usenet has high costs- Large storage requirement
- Bandwidth required: OC3+ ( $30,000/month)
Only 50 sites with full feed
Goal: save Usenet news by reducing needed storage and bandwidth
![Page 37: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/37.jpg)
37
• User posts article to local server
• Server exchanges headers & article w. peers• Headers allow sorting into newsgroups
S3
S1 S4S2
Posting a Usenet Article
![Page 38: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/38.jpg)
38
UsenetDHT
Store article in shared DHT
Only “single” copy of Usenet needed
Can scale DHT to handle increased volume
Incentive for ISPs: cut external bandwidth by providing high-quality hosting for local DHT server
![Page 39: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/39.jpg)
39
• Server writes article to DHT
• Server exchanges headers only• All servers know about each article
S3
S1 S4S2
DHT
Usenet Architecture
• User posts article to local server
![Page 40: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/40.jpg)
40
UsenetDHT Tradeoff
Distribute headers as before:- clients have local access to headers
Bodies held in global DHT- only accessed when read
- greater latency, lower overhead
![Page 41: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/41.jpg)
41
UsenetDHT: potential savings
Suppose 300 site network Each site reads 1% of all articles
Net bandwidth Storage
Usenet
UsenetDHT
12 Megabyte/s 10 Terabyte/week
120 Kbyte/s 60 Gbyte/week
![Page 42: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/42.jpg)
42
Three Classes of DHT Applications
Rendezvous, Storage, and Routing
![Page 43: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/43.jpg)
43
“Routing” Applications
Application-layer multicast Video streaming Event notification systems ...
![Page 44: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/44.jpg)
44
DHT-Based Multicast
Application-layer, not IP layer
Single-source, not any-source multicast
Easy to extend to anycast
![Page 45: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/45.jpg)
45
Tree Formation
Group is associated with key
“root” of group is node that owns key
Any node that wants to join sends message to root, leaving forwarding state along path
Message stops when it hits existing state for group
Data sent from root reaches all nodes
![Page 46: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/46.jpg)
46
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Multicast
Root(k)
![Page 47: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/47.jpg)
47
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Multicast Join
Join(k)
Root(k)
![Page 48: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/48.jpg)
48
Join(k)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Multicast Join
Root(k)
![Page 49: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/49.jpg)
49
Join(k)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Multicast Join
Root(k)
Join(k) Join(k)Join(k)
![Page 50: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/50.jpg)
50
Join(k)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
Multicast Send
Root(k)
Join(k) Join(k)Join(k)
![Page 51: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/51.jpg)
51
Challenges
Repairing tree
Balancing duties among peers
Low-latency routing (proximity-based DHT routing)
![Page 52: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/52.jpg)
52
Internet-Scale Query Processing
Superficial motivation:- Database joins implemented with hash tables so...
- Distributed joins can be implemented with DHTs
- Scaling: latency O(log n) while computation O(n)
K1 A
K1 B
K1 C
K1 D
K2 E
K2 A
K2 F
K2 A
K1 A
K2 A
K2 A
Put(A,..)
Put(A,..)
Put(A,..)
![Page 53: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/53.jpg)
53
PIER
Range of operators- Joins, aggregation (routing!), recursive, continuous queries
Intended targets:- Data “in the wild” (filesharing, net monitoring, etc.)
- No need for ACID semantics, just best-effort
Future: more sophisticated queries- Range searches, etc.
- Prefix Hash Tree
![Page 54: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/54.jpg)
54
IPNetwork
Network
DHTWrapper
StorageManager
OverlayRouting
DHT
CoreRelationalExecution
EngineCatalogManager
QueryOptimizer
PIER
NetworkMonitoring
Other UserApps
Applications
Physical Network
Overlay Network
Query Plan
DeclarativeQueries
![Page 55: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/55.jpg)
55
What’s the Fuss about DHTs?
Goals, Strategy, Tactics
![Page 56: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/56.jpg)
56
Distributed Systems Pre-Internet
Connected by LANs (low loss and delay)
Small scale (10s, maybe 100s per server)
PODC literature focused on algorithms to achieve strict semantics in the face of failures
- Two-phase commits
- Synchronization
- Byzantine agreement
- Etc.
![Page 57: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/57.jpg)
57
Distributed Systems Post-Internet
Very different context:- Huge scales (thousands if not millions)- Highly variable connectivity- Failures common- Organic growth
Abandoned distributed strict semantics- Adaptive apps rather than “guaranteed” infrastructure
Adopted pairwise client-server approach- Server is centralized (even if server farm)- Relatively primitive approach (no sophisticated dist. algms.)- Little support from infrastructure or middleware
![Page 58: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/58.jpg)
58
Problems with Centralized Server Farms
Weak availability:- Susceptible to point failures and DoS attacks
Management overhead- Data often manually partitioned to obtain scale
- Management and maintenance large fraction of cost
Per-application design (e.g., GoogleOS)- High hurdle for new applications
Don’t leverage the advent of powerful clients- Limits scalability and availability
![Page 59: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/59.jpg)
59
The DHT Community’s Goal
Produce a common infrastructure that will help solve these problems by being:
Robust in the face of failures and attacks- Availability solved
Self-configuring and self-managing- Management overhead reduced
Usable for a wide variety of applications- No per-application design
Able to support very large scales, with no assumptions about locality, etc.
- No scaling limits, few restrictive assumptions
![Page 60: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/60.jpg)
60
The Strategy
Define an interface for this infrastructure that is:
Generally useful for a wide variety of applications- So many applications can leverage this work
Can be supported by a robust, self-configuring, widely-distributed infrastructure
- Addressing the many problems raised before
![Page 61: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/61.jpg)
61
Research Plan (Tactics)
Two main research themes:
Above Interface: Investigate the variety of applications that can use this interface
- Many prototypes, trying to stretch limits
- Some exploratory, others more definitive
Below Interface: Investigate techniques for supporting this interface
- Many designs and performance experiments
- Looking at extreme limits (size, churn, etc.)
![Page 62: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/62.jpg)
62
Hourglass Analogy
Interface
Applications
InfrastructureAlgorithms
![Page 63: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/63.jpg)
63
Two Crucial Design Decisions
Technology for infrastructure: P2P- Take advantage of powerful clients
- Decentralized
- Nodes can be desktop machines or server quality
Choice of interface: Lookup and Hash Table- Lookup(key) returns IP of host that “owns” key
- Put()/Get() standard HT interface
- Some flexibility in interface (no strict layers)
![Page 64: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/64.jpg)
64
What is a P2P system?
A distributed system architecture:
- No centralized control
- Nodes are symmetric in function Large number of (perhaps) server-quality nodes Enabled by technology improvements
Node
Node
Node Node
Node
Internet
![Page 65: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/65.jpg)
65
P2P as Design Style
Resistant to DoS and failures- Safety in numbers, no single point of attack or failure
Self-organizing- Nodes insert themselves into structure
- Need no manual configuration or oversight
Flexible: nodes can be- Widely distributed or colocated
- Powerful hosts or low-end PCs
- Trusted or unknown peers
![Page 66: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/66.jpg)
66
But What Interface?
Challenge for P2P systems: finding content- Many machines, must find one that holds file
Essential task: Lookup(key)- Given key, find host (IP) that has file with that key
Higher-level interface: Put()/Get()- Easy to layer on top of lookup()
- Allows application to ignore details of storage
• System looks like one hard disk
- Good for some apps, not for others
![Page 67: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/67.jpg)
67
DHT Layering
Distributed hash table
Distributed application
get (key) data
node node node….
put(key, data)
Lookup service
lookup(key) node IP address
• Application may be distributed over many nodes• DHT distributes data storage over many nodes
![Page 68: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/68.jpg)
68
Virtues of DHT Interface
Simple and proven useful- Hash tables common implementation tool
API supports a wide range of applications- No structure/meaning imposed on keys
- Scalable, flat name space!
Key/value pairs are persistent and global- Can store keys in other DHT values
- And thus build complex data structures
![Page 69: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/69.jpg)
69
Scenarios for DHT Usage
Where might there be a need for another approach?
![Page 70: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/70.jpg)
70
Scenario #1: Public Infrastructure
Consider CiteSeer or other nonprofit systems:- Service is very valuable to community
- No source of revenue
How can it expand?- Not enough support for expanding centralized facility
- But many institutions would donate remote use of their local machines
System problem: - Coordinating donated distributed infrastructure
![Page 71: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/71.jpg)
71
The DHT Approach
DHTs are well-suited to such settings- Inherently distributed with general interface
- Naturally provides rendezvous and data sharing
Developers can focus on how to layer app on top of DHT library
- Resilience, scaling, all taken care of by DHT
Typical assumption for important services: - Server-like nodes with good network access
![Page 72: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/72.jpg)
72
Examples
CiteSeer- Replicate current service (OverCite), but with 10x performance
improvement- Use additional capacity to provide new features (e.g., SmartSeer’s
alerts)
Cooperative CDNs- Coral allows universities to collaboratively handle “slashdot”
workloads- Operational today with many users
UsenetDHT- Allows cooperative institutions to share bandwidth load- Operational system with small feed running
![Page 73: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/73.jpg)
73
Scenario #2: Scaling Enterprise Apps
Enterprises rely on several crucial services- Email, backup, file storage
These services must be - Scalable
- Robust
- Easy to deploy
- Easy to manage
- Inexpensive
![Page 74: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/74.jpg)
74
The DHT approach
Build all services on DHT interface
DHT infrastructure:- Scalable (just add nodes, need not be local)
- Robust
- Easy to deploy
- Easy to manage
- Exploits inexpensive commodity components
![Page 75: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/75.jpg)
75
Examples
Email- ePOST (Rice)
Backup- MIT
File storage- OceanStore
![Page 76: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/76.jpg)
76
Scenario #3: Supporting Tiny Apps
Many apps could use DHT interface, but are too small to deploy one themselves
- Small: user population, importance, etc.
Such an application could use a DHT service
OpenDHT is a public DHT service - Lecture on this next week...
![Page 77: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/77.jpg)
77
Scenario #4: Super-Resilence
DHTs are a natural way to build super-resilient services
DHTs would be a natural candidate for the next generation name service, or other such crucial pieces of the infrastructure
![Page 78: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/78.jpg)
78
Not Just for Applications
DHTs resolve flat names scalably- We haven’t been able to do this before
How would we redesign the Internet, now that we can resolve flat names?
![Page 79: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/79.jpg)
79
DHTs and Internet Architecture?
![Page 80: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/80.jpg)
80
Early Applications Were Host-Centric
Destination part of user’s goal:- e.g., Telnet
Specified by hostname, not IP address- DNS translates between the two
DNS built around hierarchy:- local decentralized control (writing)
- efficient hostname resolution (reading)
![Page 81: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/81.jpg)
81
Internet Naming is Host-Centric
DNS names and IP addresses are the only global naming systems in Internet
These structures are host-centric:- IP addresses: network location of host
- DNS names: domain of host
Both are closely tied to an underlying structure:- IP addresses: network topology
- DNS names: domain structure
![Page 82: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/82.jpg)
82
The Web is Data-Centric
URLs function as the name of data- Users usually care about content, not location
- www.cnn.com is a brand, not a host
- Tying data to hosts is unnatural
URLs are bad names for data: - Not persistent (name changes when data moves)
- Can’t handle piecewise replication
- Legal contention over names
![Page 83: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/83.jpg)
83
Larger Lesson
For many objects, we will want persistent names
If a name refers to properties of its referent that can change, the name is necessarily ephemeral.
- IP addresses can’t serve as persistent host names- URLs can’t serve as persistent data names
Why do names have structure, anyway?
![Page 84: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/84.jpg)
84
Old Implicit Assumption
Internet names must have hierarchical structure in order to be resolvable
Setting up a new naming scheme requires defining a new (globally recognized) hierarchy
Problem: For these names to be persistent, the hierarchy must match the natural structure of the objects they name.
- What is the natural hierarchy of documents?
![Page 85: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/85.jpg)
85
DHTs Enable Flat Names
Flat names are names with no structure
DHTs resolve flat names in logarithmic time- And often much faster
- This is the same as in a tree
- No longer need hierarchy for resolution speed
But, flat names pose other problems (return to later)- Control (used to be locally managed)
- Locality (part of DNS’s success)
- User-friendliness
![Page 86: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/86.jpg)
86
Why Are Flat Names Good?
Flat names impose no structure on the objects they name- Not true with structured names like DNS or IP add’s
Flat names can be used to name anything
Once you have a large flat namespace, you never need another naming system
- One namespace
- One resolution infrastructure
![Page 87: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/87.jpg)
87
Semantic-Free Referencing (SFR)
Replace URLs by flat, semantic-free keys- Persistent
- No contention
Use a DHT to resolve keys to host/path- “A DNS for data”
- Replication easy: multiple entries
Other design issues: - Ensure data security and integrity
- Provide fate-sharing and locality
![Page 88: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/88.jpg)
88
Elegant but Unusable?
How to get the keys you want?- Third-party services will provide mapping between user-level names
and keys (think: Google)
- Competitive market outside infrastructure
Do you have the key you wanted?- Metadata includes signed “testimonials” (3rd party)
Who is going to supply the resolution service?- Competitive market much like tier-1 ISPs?
- Each access or store is by or for customers
![Page 89: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/89.jpg)
89
Why Stop with the Web?
DHTs enable use of flat names
Names should not impose structure on referents- Flat names can name anything
Why not a single name resolution infrastructure?- A generalized DNS
New architecture proposed to support:- endpoint identifiers
- service identifiers
![Page 90: CS 194: Distributed Systems DHT Applications: What and Why](https://reader035.vdocument.in/reader035/viewer/2022062723/56813bf0550346895da52512/html5/thumbnails/90.jpg)
90
Layered Naming for the Internet
Software should use names at the proper level of abstraction
Application (SIDs)
Transport Protocol (EIDs)
IP (IP addresses)
EID Resolution (to IP address)
SID Resolution (to EID)