1. efficient peer-to-peer lookup based on a distributed trie 2. complex queries in dht-based...

1. Efficient Peer-to-Peer Lookup Based on a

Distributed Trie2. Complex Queries in DHT-based

Peer-to-Peer Networks

Lintao Liu

5/21/2002

Existing Lookup Algorithm• Broadcase-like search

– Gnutella, FreeNet

• DHT(Distributed Hash Table)– CAN, Chord, Pastry, Tapestry

• Comparison– Maintenance cost– Efficiency, scalability?

Why introduce Distributed Trie?• Each peer holds partitions of the key space

according to some assignment policy

• Relaxing the consistency criteria of partition assignment can reduce the maintenance costs– Stale replicas?– Piggyback the updates– Reconcile conflicting updates

System Model

• Interfaces– Lookup(key), Insert(key, value), join();

• Trie Structure– Internal trie nodes consist of 2m routing table– Each table consists of l entries– Each entry: (peer address, timestamp)– Structure (refer to the figure in the paper)

Trie lookup structure

• Leaf node: – the entry in its routing table indicates the peer

address holding a value.

• Intermediate node: – The entry at its ith routing table indicates that

the peer a held a replica of the ith child of the node.

• Ancestor invariant: the path from the root

Algorithm• Join

– Introducer, join request, bootstrap– Node ID?? How to construct routing table???

• Insertion– Performed locally by inserting the key/value pair– ?????

• Lookup– Try local data,– Go deeper on the trie– Backtrack

Different Mode• Used for updating trie state

– When to do that? While query a key or periodically?

• Bounded mode– Only return the routing table which is more specific

than the current routing table on the caller

• Unbounded mode– Return the most specific routing table for the key– Looks like they do this while querying a key

• Full path mode– Return the full path(include the routing table) from the

root table to the most specific routing table for the key

Security Issues

• How much do we believe the routing table from other nodes?

• Conservative mode– example

• Liberal mode– Just believe all updates without doubt

Complex queries in DHT-based P2P Networks

• Motivation– Two problems:

• scalability,

• Query languages

– DHT (Distributed Hash Table)• Chord, Pastry, Tapestry, CAN

• Improve scalability and “exact match” efficiency

• Cannot do complex queries

Text Retrieval and Hash Indexes• In order to handle “fuzzy” matches

– Split each string to be indexed into n-grams

– For each such n-grams gi, the pair (gi, I)is inserted into the hash index, keyed by gi

• Lookup– Split the string into n-grams, and lookup each n-grams

in the index

– Return those files for which the count of copies is as much as the number of n-grams in the query.

• Too many I for some popular gi?

More argues

• This is not P2P database– A lot of reasons to explain this is not DB even

through a lot of DB technique is used here.

Implementation Architecture

• Three layers– Data Store (Iterator, Accessors to attributes…)

– DHT Layer(handle network routing)• Put/get, Iterator, Callback newData

– Query processing layer (handle parallel queries)

• Namespace & Multicast– Organize the flat key space into a hierarchical space

– Multicast queries to all peers in one group

More Issues

• Related to DB technique,

• General Idea is to make parallel queries (not sure)

• Cannot understand very well.

• Discussion……

1. efficient peer-to-peer lookup based on a distributed trie 2. complex queries in dht-based...

Documents