lecture 4 query processing in p2p

75
Advanced Distributed Computing 1 Lecture 4 Query Processing in P2P Part B Zhou Shuigeng April 6, 2007

Upload: others

Post on 03-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 1

Lecture 4

Query Processing in P2P

Part B

Zhou Shuigeng

April 6, 2007

Page 2: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 2

PapersR. Huebsch, J. M. Hellerstein, N. Lanham. Et al. Querying the Internet with PIER, VLDB2003.M. Bawa, R. J. Bayardo Jr., S. Rajagopalan, et al. Make it fresh, make it quick – searching a networks of personal Webservers, WWW2003.M. Arenas, V. Kantere, et al. The Hyperion Project: From Data Integration to Data Coordination. SIGMOD Record, 2003, 32(3):53-58.P. Kalnis, W. S. Ng, B. C. Ooi and K. L. Tan. Answering similarity queries in P2P networks. IS, 2005

Outline

Page 3: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 3

(From VLDB2003)

Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica,Nick Lanham, Boon Thau Loo, Scott Shenker

EECS, UC Berkely

Querying the Internet with PIER

Page 4: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 4

OutlineIntroductionWhat is PIER?

Design PrinciplesImplementation

DHTQuery Processor

PerformanceSummary

Page 5: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 5

IntroductionDatabases

powerful query facilitiesdeclarative interfacepotential to scale up to few hundred computers

What about Internet? If we want well distributed system that has

query facilities (SQL)fault toleranceflexibility

PIER is a query engine that scales up to thousands of participating nodes and can work on various data

Page 6: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 6

OutlineIntroductionWhat is PIER?

Design PrinciplesImplementation

DHTQuery Processor

PerformanceSummary

Page 7: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 7

What is PIER?PIER: Peer-to-Peer Information Exchange and RetrievalQuery engine that runs on top of P2P network

step to the distributed query processing at a larger scaleway for massive distribution: querying heterogeneous data

Architecture meets traditional database query processing with recent peer-to-peer technologies

Page 8: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 8

Design PrinciplesRelaxed Consistency

adjusts availability of the systemOrganic Scaling

No need in a priori allocation of a data centerNatural Habitats for Data

No DB schemas, file system or perhaps a live feed

Standard Schemas via Grassroots Software

widespread programs provide de facto standards

Page 9: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 9

OutlineIntroductionWhat is PIER?

Design PrinciplesImplementation

DHTQuery Engine

ScalabilitySummary

Page 10: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 10

Implementation – DHT

<< based on CAN

DHT structure:• routing layer• storage manager• provider

CoreRelationalExecution

Engine

ProviderStorageManager

OverlayRouting

CatalogManager

QueryOptimizer

Various User Applications

PIER

DHT

Apps

Page 11: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 11

Routing layermaps a key into the IP address of

the node currently responsible for that key. Provides exact lookups, callbacks higher levels when the set of keys has changed

Routing layer APIlookup(key) ipaddrjoin(landmarkNode)leave()locationMapChange

DHT – Routing & StorageStorage Managerstores and retrieves

records, which consist of key/value pairs. Keys are used to locate items and can be any data type or structure supported

Storage Manager APIstore(key, item)retrieve(key) itemremove(key)

Page 12: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 12

DHT – Provider (1)Providerties routing and storage manager layers

and provides an interfaceEach object in the DHT has a namespace, resourceID and instanceIDDHT key = hash(namespace,resourceID)

ProviderStorageManager

OverlayRouting

namespace - application or group of object, tableresourceID – what is object, primary key or any attributeinstanceID – integer, to separate items with the samenamespace and resourceIDCAN’s mapping of resourceID/Object is equivalent to an index

Page 13: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 13

DHT – Provider (2)Provider APIget(namespace, resourceID) itemput(namespace, resourceID, item, lifetime)renew(namespace, resourceID, instanceID, lifetime) boolmulticast(namespace, resourceID, item)lscan(namespace) itemsnewData(namespace, item)

Node R1

(1..n)Table R (namespace)

(1..n) tuples

(n+1..m) tuplesNode R2

(n+1..m)rID1

item

rID3

item

rID2

item

Page 14: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 14

Implementation – Query Engine

<< query processor

QP Structure:core enginequery optimizercatalog manager

CoreRelationalExecution

Engine

ProviderStorageManager

OverlayRouting

CatalogManager

QueryOptimizer

Various User Applications

PIER

DHT

Apps

Implementing operators for selection, projection, distributed joins, grouping, and aggregation.

Page 15: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 15

Query ProcessorHow it works?

performs selection, projection, joins, grouping, aggregationsimultaneous execution of multiple operators pipelined togetherresults are produced and queued as quick as possible

How it modifies data?insert, update and delete different items via DHT interface

How it selects data to process?dilated-reachable snapshot – data, published by reachable nodes at the query arrival time

Page 16: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 16

Query Processor – Joins (1)

Symmetric hash joinAt each site

(Scan) lscan NR and NS(Rehash) put into NQ a copy of each eligible tuple(Listen) use newData to see the rehashed tuples in NQ(Compute) join the tuplesas they arrive to NQ

*Basic, uses a lot of network resources

Join(R,S, R.sid = S.id)

NX

NXNR

NS

NR

NS

put(Rtup)

put(Stup )

newData

multicast query

lscan(NR)

lscan(NS)lscan(NR)

lscan(NS)

NQ

NQNR

NS

NR

NS

Page 17: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 17

Query Processor – Joins (2)

Fetch matchesAt each site

(Scan) lscan(NR) (Get) for each suitable R tuple get for the matching StupleWhen S tuplesarrive at R, join themPass results

*Retrieve only tuplesthat matched

Join(R,S, R.sid = S.id)

NR

NXNX

NS

NR

NS

hashed

hashed

get(rID) get(rID)

S tup

Stup

Page 18: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 18

Performance: Join Algorithms

0

2000

4000

6000

8000

10000

12000

14000

16000

0 20 40 60 80 100

Selectivity of predicat on relation S

Ave

rage

net

wor

k tr

affic

SHJ FM

R + S = 25 GB n = m = 1024

inbound capacity = 10 Mbps

hop latency =100 ms

Page 19: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 19

Query Processor – Join rewriting

Symmetric semi-join(Project) both R and S to their resourceIDs and join keys(Small rehash) Perform a SHJ on the two projections (Compute) Send results into FM join for each of the tables

*Minimizes initial communication

Bloom joins(Scan) create Bloom Filter for a fragment of relation(Put) Publish filter for R, S(Multicast) Distribute filters(Rehash) only tuplesmatched the filter(Compute) Run SHJ

*Reduces rehashing

Page 20: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 20

Performance: Join Algorithms

0

2000

4000

6000

8000

10000

12000

14000

16000

0 20 40 60 80 100

Selectivity of predicat on relation S

Ave

rage

net

wor

k tr

affic

SHJ FM SSJ BF

R + S = 25 GB n = m = 1024

inbound capacity = 10 Mbps

hop latency =100 ms

Page 21: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 21

OutlineIntroductionWhat is PIER?

Design PrinciplesImplementation

DHTQuery Processor

ScalabilitySummary

Page 22: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 22

Scalability SimulationConditions

|R| =10 |S|Constants produce selectivity of 50%

Query:SELECT

R.key, S.key, R.pad

FROM R,SWHERE R.n1 = S.key

AND R.n2 > const1AND S.n2 >

const2AND f(R.n3,S.n3)

> const3

Page 23: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 23

Experimental ResultsEquipment:

cluster of 64 PCs1 Gbps network

Result:Time to receive 30-th result tuple practically remains unchanged as both the size and load are scaled up.

Page 24: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 24

SummaryPIER is a structured query system intended to run at the big scalePIER queries data that preexists in the wide areaDHT is a core scalability mechanism for indexing, routing and query state management Big front of future work:

CachingQuery optimizationSecurity…

Page 25: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 25

Piazza: Data Management Infrastructure for Semantic WebApplications

(From WWW2003)

Alon Y. Halevy, Zachary G. Ives, Peter Mork, Igor Tatarinov

University of Washington

Page 26: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 26

OutlineIntroductionSemantic WebPIAZZA: system overviewImplementation details

Mapping languageQuery answering algorithm

Conclusions

Page 27: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 27

IntroductionGoal:

Data Integration and Knowledge Management

Problem:Web data lacks machine-understandable semantics

Solution:Semantic Web?

Page 28: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 28

The Semantic Web*

Web sites include structural annotationsYou can pose meaningful queries on them.Ontologies provide the semantic glue.Internal implementation of web sites left open.

Agents perform tasks:Query one or more web sitesPerform updates (e.g., set schedules)Coordinate actionsTrust each other (or not).

I.e., agents operating on a gigantic heterogeneous distributed database.

(*View by A. Halevy)(*View by A. Halevy)

Page 29: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 29

General requirementsRobust infrastructure for querying

Peer data management systems.

Facilitate mapping between different structures. Need tools for:

Locating relevant structuresEasily joining the semantic web.

Get data into structured formShould we worry about the legacy web?

Page 30: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 30

Using views for specifying mappings

Local-As-View (LAV). Data sources can be described as views over the mediated schema.

Global-As-View (GAV).Mediated schema can be described as a set of views over the data sources.

Mediated Schema

Site BSite A Site C

Mediated Schema

Site BSite A Site C

Page 31: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 31

Mapping

Mapping AB specifies representation of structured data from scheme of node A into scheme of node B

Mediated Schema

Site BSite A Site C

Mapping “AB”

Mapping “BA”

Mapping “BC”

Mapping “CB”

Mapping

“C-MS”

Mapping

“MS-C”Mapping

“A-MS”Mapping

“MS-A”

Page 32: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 32

Piazza: Peer Data-Management System

Goal:Large scale autonomous sharing of structured data

Peer data management system (PDMS)Autonomous Peers export data in their own schemasPair-wise mappings between peersGeneralization of a Data Integration systemNOT a P2P file sharing system

Page 33: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 33

Relationship of PDMS to…

P2P overlay networks (the “Structured World”)

Data integration systems (no central logical mediated schema)

Federated databases (scale, ad-hoc nature)

Distributed databases (no central administration)

Page 34: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 34

Representing DataA spectrum of possibilities:

Relational tables, some integrity constraintsXML: can encode relational, hierarchical

Xquery – emerging standard query language (SQL for XML)RDF: “XML on drugs”.

Sees only the logic; ignores other aspects.DAML+OIL

Full-blown Knowledge representation language.They all have semantics; just different expressive powers.We keep the data simple. Mappings between data at different peers are more complex.

Page 35: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 35

Peer Data Management

Mappings are query expressions DbResearcher(x) ⊇ Researcher(x),Area(x,DB)DbResearcher(x), Office(x,DBLab) = DbLabMember(x)

DB Projects

MIT UW UCB Stanford

Area(areaID, name, descr)Project(projID, name, sponsor)ProjArea(projID, areaID)Pubs(pubID, projName, title, venue, year)Author(pubID, author)Member(projName, member)

Project(projID, name, descr)Student(studID, name, status)Faculty(facID, name, rank, office)Advisor(facID, studID)ProjMember(projID, memberID)Paper(papID, title, forum, year)Author(authorID, paperID)

Area(areaID, name, descr)Project(projID, areaID, name)Pub(pubID, title, venue, year)PubAuthor(pubID, authorID)PubProj(pubID, projID)Member(memID, projID, name, pos)Alumn(name, year, thesis)

Members(memID, name)Projects(projID, name, startDate)ProjFaculty(projID, facID)ProjStudents(projID, studID)…

Direction(dirID, name)Project(pID, dirID, name)…

Page 36: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 36

Piazza mapping language (1)

Target:

pubsbook*

titleauthor*

namepublisher*

name

Source:

authorsauthor*

full-namepublication*

titlepub-type

<pubs><book>

{: $a IN document(“source.xml”)\/authors/author$t IN $a/publication/title,$typ IN $a/publication/pub-typeWHERE $typ = “book” : }

<title> { $t }</title><author>

<name> {: $a/full-name :} </name></author>

</book></pubs>

XML/XML Example

Page 37: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 37

Piazza mapping language (2)

Target:

pubsbook*

titleauthor*

namepublisher*

name

Source:

authorsauthor*

full-namepublication*

titlepub-type

piazza:id attribute <pubs><book piazza:id={$t}>

{: $a IN document(“source.xml”)\/authors/author$t IN $a/publication/title,$typ IN $a/publication/pub-typeWHERE $typ = “book” : }

<title piazza:id={$t}> { $t }</title><author piazza:id={$t}>

<name> {: $a/full-name :} </name></author>

</book></pubs>

Page 38: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 38

Piazza mapping language (3)

Target:

pubsbook*

titleauthor*

namepublisher*

name

Source:

authorsauthor*

full-namepublication*

titlepub-type

Partial mapping <pubs><book piazza:id={$t}>

{: $a IN document(“source.xml”)\/authors/author$t IN $a/publication/title,$typ IN $a/publication/pub-typeWHERE $typ = “book” : }

PROPERTY $t >=’A’ AND $t < ‘B’: }[: <publisher>

<name>{: PROPERTY $this IN{“PrintersInc”, “PubsInc”} :}

</name></publisher> :]

</book></pubs>

Page 39: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 39

Query Answering AlgorithmProblem

Evaluate query Q at P1 given a network of mappings

Reformulate the query over all relevant peersChaining of mappings using a combination of query composition and query rewriting

QP1(x) :- DbResearcher(x)Query Composition

M: DbResearcher(x) ⊇ Researcher(x),Area(x,DB) QP2 (x) ⊇ Researcher(x),Area(x,DB)

Query RewritingM: DbResearcher(x), Office(x,DBLab) = DbLabMember(x)

QP3 (x) ⊇ DbLabMember(x)

Page 40: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 40

Query Reformulation (1)Mapping:

<S2> <people> {: $people=/S1/people :}

<faculty> {: $name=$people/faculty/name/text():}{ $name}

</faculty><student>{: $student=$people/student/text():}

<name> { $student } </name><advisor> {: $faculty=$people/faculty,

$name=$faculty/name/text(),$advisee=$faculty/advisee/text()where $advisee=$student :}

{ $name }<advisor>

</student></people>

</S2>

<result> { for $faculty in /S1/people/faculty,

$name in $faculty/name/text(),$advisee in $faculty/advisee/text()

where $name = “Ullman”return

<student> {$advisee} </student> }</result>

Query:

Page 41: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 41

Query Reformulation (2)

<result> { for $faculty in /S1/people/faculty,

$name in $faculty/name/text(),$advisee in $faculty/advisee/text()

where $name = “Ullman”return

<student> {$advisee} </student> }</result>

Query:<result>

name advisee$name = “Ullman”

<student> {$advisee}

S1peoplefaculty

<S2>

S1<people> people

facultyname

<faculty> {$name}

student<student>

<name> {$student}

faculty

name advisee$advisee=$student<advisor> {$name}

Query tree pattern:

Mapping tree pattern:

Page 42: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 42

Query Reformulation (3)Query:

<result>

name advisee$name = “Ullman”

<student> {$advisee}

S1peoplefaculty

<S2>

S1<people> people

facultyname

<faculty> {$name}

student<student>

<name> {$student}

faculty

name advisee$advisee=$student<advisor> {$name}

Query tree pattern:

Mapping tree pattern:

<result> { for $faculty in /S2/people/student,

$advisor in $student/advisor/text(),$name in $student/name/text()

where $advisor = “Ullman”return

<student> { $name } </student>}</result>

Page 43: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 43

Reformulation timesTable 1: The test queries and their

respective running times.Query Description Reformulation time # of reformulations

Q1 XML-related projects. 0.5 sec 12

Q2 Co-authors who reviewed each other's work. 0.9 sec 25

Q3PC members with a paper

at the same conference.

0.2 sec 3

Q4PC chairs of recent

conferences + their projects.

0.5 sec 24

Q5 Conflicts-of-interest of PC members. 0.7 sec 36

Page 44: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 44

Current and the FutureCurrent status

Demo scenario using XML Looking at real domains (Bio dbs, NASA dbs)

Future WorkMore efficient reformulation algorithmSemantic network analysis – eliminate redundant mappings and inconsistent mappingsQuery caching to speed up query evaluation

Page 45: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 45

ConclusionsMapping language for mapping between sets of XML source nodes with different document structures

Architecture that uses the transitive closure of mappings to answer queries

Algorithm for query answering over this transitive closure of mappings, which is able to follow mappings in both forward and reverse directions

Page 46: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 46

The Hyperion Project:From Data Integration to

Data Coordination

Marcelo Arenas, Vasiliki KantereAnastasios Kementsietsidis, Iluju Kiringa

Renee J. Miller, John Mylopoulos

University of Toronto, University of Ottawa

Page 47: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 47

The Hyperion Project: Objectives

To investigate the data management issues that are raised by the P2P data-sharing paradigm

Definition of a peer-to-peer data management architecture Study of viable data integration, exchange, and mapping mechanisms Development of algorithms for the efficient search, retrieval and exchange of data among peers

Page 48: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 48

ContributionsIntroduce an architecture for a data-sharing data management system

Propose mechanisms called mapping tablesand coordination rules to support data-sharing services in the system

Propose a query mechanism for data-sharing environments

Page 49: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 49

ArchitectureBasic (architectural) design principles:

No global (common) schemas/vocabularies/…

Each peer source manages its own data

Each peer source manages its own acquaintances.

Page 50: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 50

Mapping Tables and ExpressionsMapping tables and mapping expressions are the basic tools for exchanging information between peers

Instances for the two airline databases

Page 51: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 51

P2P CoordinationExpress active functionality with ECA rulesthat reside in the P2P layer of a PDBMS and involve a peer and some of its acquaintancesRules make use of mapping tables and expressions to coordinate data and meta-data (for example, the mapping tables themselves) between peersExtends the distributed ECA rule language of Kantere

Page 52: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 52

Answering Similarity Query in Peer-to-Peer Networks

(Information Systems Journal, 2005)

Kalnis P., Ng W.S., Ooi B.C., Tan K.L.National University of Singapore

Page 53: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 53

Background (1)Given an image example, searching the top-ksimilar images in a data sharing P2P systemThe naive approach

Access each peer within the query horizon, retrieve the local top-k similar images and send them to the query peerAt the query peer, the global top-k similar images are selected as the final resultDrawback: too many useless results are transmitted to the query peer

Page 54: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 54

An alternative is to set a threshold (global), only images with similarity to the sample not less than the threshold are sent to the query peerThe challenge

How to settle the threshold? It is dependent on the query

Background (2)

Page 55: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 55

Observations that may benefit the query processing

If two queries are similar, the top-k answers for the first one may contain (with high probability) some of the answers for the second queryIn P2P networks, each peer can examine the messages that pass through it

Background (3)

Page 56: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 56

Basic Ideas of the PaperThe major goal is to reduce query messages in the networkSome queries are stopped (i.e., they are not propagated further) and stay resident inside a set of peersThe frozen queries are answered by the stream of results that passes through the peers, and was initiated by the remaining running queries

Page 57: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 57

Contributions of the PaperThe proposal of FuzzyPeer architecture for similarity searching in P2P networksThe development of optimizing techniquesfor deciding dynamically the set of frozen queries and assigning them to the appropriate result stream.A extensive experimental evaluation of the proposed approach

Page 58: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 58

System Architecture1) A maximum

number of hops d is settled

2) The query processing period is also settled to MaxWaitTime

A Typical Fuzzy Peer Network

Page 59: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 59

Query Processing (1)

Query q

A Typical Fuzzy Peer Network

Page 60: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 60

Query q

Query Processing (2)

A Typical Fuzzy Peer Network

Page 61: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 61

Query q

Query Processing (3)

A Typical Fuzzy Peer Network

Page 62: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 62

Query q

Query Processing (4)

A Typical Fuzzy Peer Network

Page 63: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 63

Query q

Query Processing (5)

A Typical Fuzzy Peer Network

Page 64: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 64

Query Freezing (1)

Query q

Query q’q≈q’

A Typical Fuzzy Peer Network

Page 65: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 65

Query q

Query q’q≈q’

q’ q’q

q

Query Freezing (2)

A Typical Fuzzy Peer Network

Page 66: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 66

Query q

Query q’

Streamq

q≈q’

q q

qq

qq’

q’

Query Freezing (3)

q’ is freezed at P2

A Typical Fuzzy Peer Network

Page 67: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 67

Prototype Implementation

Peer Architecture

Page 68: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 68

Query Processing (1)Preprocessing

Sample image S is transformed into a m-D feature space by wavelet transformation: {f1, f2, …, fm}Query q, similarity between S and q is Sim(q, S)Similarity between two queries q and q’is Sim(q, S)Here Sim(*, *) may be Euclidian distance

Page 69: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 69

When a peer receives a query q, it returns the top-k similar local images: img1, img2, …, imgk in the form of {idi, Sim(q, imgi)}Messages processing

Query Processing (2)

Message Propagation Model

Page 70: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 70

For a query, two parameters determine its freezing

Freezing probabilityThe number of hops it must travel before being freezing

How to select beneficial streamThe lifetime of stream candidateThe similarity between query in question and the query corresponding to the stream

Static Query Freezing (SQF)

Page 71: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 71

Static Query Freezing (SQF)

Page 72: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 72

Network workload decides when query freezing happens

Query freezing is independent of the query itself

The freezing condition

Adaptive Query Freezing (AQF)

Page 73: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 73

Adaptive Query Freezing (AQF)

Page 74: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 74

Similarity Query Freezing (simQF)

Different from AQF in query freezing conditionA query q is frozen at a peer P if there is an answer stream whose distance to q is less than a threshold ρDrawback: must provide a threshold ρ

Page 75: Lecture 4 Query Processing in P2P

Advanced Distributed Computing 75

Experiments SettingTwo implementations

A Java based prototypeA simulator running on a 2-CPU Ultra-SRARC III server with 4GB RAM

Two network topologies are simulatedUniform and power law