count / top-k continuous queries on p2p networks 01/11/2006

Count / Top-k Continuous Queries on P2P Networks

01/11/2006

Outline

Problem Definition P2P Architecture Count Top-K Experiment Setup Future Work

Streaming Data in P2P

P2PDynamic changing topology, large scale, …

Streaming dataContinuous, unbounded, rapid, time-varying,

noise P2P + Streaming data

Dynamic in both data and topology

Objective and Goal

Objective Issue a continuous query to estimate count and

top-K Goal

Lower down the communication costLightweight maintenanceApproximated answersAn adaptive and progressive approach

Naïve approach

Flooding the overlay continuousPros

Closer to the exact answer

Cons Network congestion Still non-real time

The State-of-the-Art

CountFocus on one-time answer in P2PDeal with streaming data only

Top-KP2P environment without streaming dataDistributed environment not P2P

P2P architecture

AssumptionHierarchical P2P (Focused)

Super-peer hierarchical structure Query issuer is a super-peer Super peer connect with other super peers Each peer belongs to only one super peer

Pure unstructured P2P

Big picture

Accumulate information within a group based on the constraintand statistics

Set Constraint

Report changes

Approximated answer

Group in hierarchical P2P

Issuer

Coordinator

Group in hierarchical P2P

After partition

Group1

Group3Group2

,01,... 0ii N C

Assume we have N objects and K Groups after partition

:1, ...,

: Count at each peeri j

User-specified Epsilon

Group1

Group3Group2

User-specifiedε(Precision)

Consider a group

CoordinatorNode

Objects

Each node maintain the distribution information of owning objects

object

At initial - Polling

CoordinatorNode

At initial - Polling

CoordinatorNode

Information at coordinator after polling

object

Statistics information

object

# P1 P2 P3 P4 ΔO1 1/1 6/6 10/10 5/5 22O2 11/11 13/13 5/5 9/7 36O3 15/15 6/6 3/3 9/9 33R 0.3 0.2 -0.05 0.6T 15 15 17 13

Updated time stamp

Maximum changing rate(+/-) of objects in each peer

Change value for each objectLatest real value

Estimated value

Update to Coordinator

(Δ11, Δ21, Δ31)

(Δ12, Δ22, Δ32)

(Δ13, Δ23, Δ33)

Calculate Count

( 1) ( ),0 ,0 ,

Kl li i i j

Redistribute Epsilon

,0( , , )i if C

wi=Max(Δi)/Cx,0 where x is the i-index of Max(Δi)δi=wiεCx,0/ ∑wi

Visiting sequence

Pick those peers would violate δ

Update information

P1 P2 P3 P4 ΔO1 1/1 6/6 10/10 8/8 -O2 11/11 11/11 5/5 6/6 -O3 15/15 5/5 3/3 11/11 -R 0.3 0.4 -0.05 0.2T 15 30 17 33

For those nodes not being visited

P1 P2 P3 P4 ΔO1 1/2 6/6 10/9 8/8 25O2 11/13 11/11 5/4 6/6 34 O3 15/18 5/5 3/2 11/11 36 R 0.3 0.4 -0.05 0.2T 15 30 17 33

Un-notified Leave

P1 is dead

Remove P1’s information

Experiment Setup

Generate synthetic data set by statistics distribution for Streaming dataLife time of peers

MetricsMessage sizeCommunication costResponse latencyResult accuracy

Use Regression to predicate the reasonable trend of changesOnce a updated result is required, Super Peer

only need to ask those doubtful peers for doubtful objects

Update its counting list, and return the top k objects

Future Work

Connect and recommend latent good friends for each user Good friends: the ones with the same interests (behaviors)

Exploiting current connecting peers to discover good friends bit by bit

Design a system that could make clusters reflecting current interests of individual peers and connecting them together based on their similarity by using user’s social network

Advantages

Reduce search time and diminish query traffic by using friends list

By utilizing their different strength of arcs/edges/ties = friendshipness, social networks exceed random-walk networks in quickly finding target objects

Example

Level 1

Level 2

Example

has larger weight than

Score(Ni)

1 1( ) ( , ) ( )i i i iscore N sim N N score N

Similarity

count / top-k continuous queries on p2p networks 01/11/2006

Documents

p2p - burlington electric...

isp-aided neighbor selection for p2p systems · 3 p2p from...

using druid for interactive count distinct queries at scale...

(p2p)manual of deviceclient.doc（p2p)

count / top-k continuous queries on p2p networks

p2p key performance metrics - insight from zycus' p2p...

unstructured vs. structured p2p systems peer-to-peer...

p2p queries (2)

2005/11/09 continuous queries in p2p networks. motivation

detecting p2p traffic from the p2p flow graph

cs423-cotter1 p2p discovering p2p (miller) internet

p2p systems - polyross/tutorials/p2p... · 2004. 5. 7. ·...

p2p systems -...

1 one torus to rule them all: multi-dimensional queries in...

processing rank-aware queries in schema-based p2p...

p2p final.ppt

peer to peer technologies. outline what is p2p? p2p...

research work for this year investigate queries with...

xpeer: a self-organizing xml p2p database system · flwr...

p2p education - ndsu.eduxuchu/p2p...p2p-education is an...