1. efficient peer-to-peer lookup based on a distributed trie 2. complex queries in dht-based...
TRANSCRIPT
1. Efficient Peer-to-Peer Lookup Based on a
Distributed Trie2. Complex Queries in DHT-based
Peer-to-Peer Networks
Lintao Liu
5/21/2002
Existing Lookup Algorithm• Broadcase-like search
– Gnutella, FreeNet
• DHT(Distributed Hash Table)– CAN, Chord, Pastry, Tapestry
• Comparison– Maintenance cost– Efficiency, scalability?
Why introduce Distributed Trie?• Each peer holds partitions of the key space
according to some assignment policy
• Relaxing the consistency criteria of partition assignment can reduce the maintenance costs– Stale replicas?– Piggyback the updates– Reconcile conflicting updates
System Model
• Interfaces– Lookup(key), Insert(key, value), join();
• Trie Structure– Internal trie nodes consist of 2m routing table– Each table consists of l entries– Each entry: (peer address, timestamp)– Structure (refer to the figure in the paper)
Trie lookup structure
• Leaf node: – the entry in its routing table indicates the peer
address holding a value.
• Intermediate node: – The entry at its ith routing table indicates that
the peer a held a replica of the ith child of the node.
• Ancestor invariant: the path from the root
Algorithm• Join
– Introducer, join request, bootstrap– Node ID?? How to construct routing table???
• Insertion– Performed locally by inserting the key/value pair– ?????
• Lookup– Try local data,– Go deeper on the trie– Backtrack
Different Mode• Used for updating trie state
– When to do that? While query a key or periodically?
• Bounded mode– Only return the routing table which is more specific
than the current routing table on the caller
• Unbounded mode– Return the most specific routing table for the key– Looks like they do this while querying a key
• Full path mode– Return the full path(include the routing table) from the
root table to the most specific routing table for the key
Security Issues
• How much do we believe the routing table from other nodes?
• Conservative mode– example
• Liberal mode– Just believe all updates without doubt
Complex queries in DHT-based P2P Networks
• Motivation– Two problems:
• scalability,
• Query languages
– DHT (Distributed Hash Table)• Chord, Pastry, Tapestry, CAN
• Improve scalability and “exact match” efficiency
• Cannot do complex queries
Text Retrieval and Hash Indexes• In order to handle “fuzzy” matches
– Split each string to be indexed into n-grams
– For each such n-grams gi, the pair (gi, I)is inserted into the hash index, keyed by gi
• Lookup– Split the string into n-grams, and lookup each n-grams
in the index
– Return those files for which the count of copies is as much as the number of n-grams in the query.
• Too many I for some popular gi?
More argues
• This is not P2P database– A lot of reasons to explain this is not DB even
through a lot of DB technique is used here.
Implementation Architecture
• Three layers– Data Store (Iterator, Accessors to attributes…)
– DHT Layer(handle network routing)• Put/get, Iterator, Callback newData
– Query processing layer (handle parallel queries)
• Namespace & Multicast– Organize the flat key space into a hierarchical space
– Multicast queries to all peers in one group
More Issues
• Related to DB technique,
• General Idea is to make parallel queries (not sure)
• Cannot understand very well.
• Discussion……