distributed query processing and catalogs for peer-to-peer systems
DESCRIPTION
Distributed Query Processing and Catalogs for Peer-to-Peer Systems. Professor: Iluju Kiringa Student: Fan Yang, Libin Cai. Agenda. About P2P Mutant Query Plan Distributed Catalog Intentional Statements Security and Privacy Conclusions. About P2P. Advantages: Ease of deployment - PowerPoint PPT PresentationTRANSCRIPT
Distributed Query Distributed Query Processing and Catalogs Processing and Catalogs
for Peer-to-Peer for Peer-to-Peer SystemsSystems
Distributed Query Distributed Query Processing and Catalogs Processing and Catalogs
for Peer-to-Peer for Peer-to-Peer SystemsSystems
Professor: Professor: Iluju KiringaIluju Kiringa Student: Fan Yang, Libin CaiStudent: Fan Yang, Libin Cai
Agenda• About P2P• Mutant Query Plan• Distributed Catalog• Intentional Statements• Security and Privacy• Conclusions
About P2P
• Advantages:– Ease of deployment– Ease of use– Fault tolerance– Scalability
• Limitations:– Weak query capabilities– No infrastructure for distributed queries– Limitations in index scalability and result
quality
A query example
FOR $r in document(‘‘film_reviews’’)//review, $g in document(‘‘preferences’’)//genre,$s in document(‘‘film_showings’’) / showing[date = ‘‘15 March 2002’’]WHERE $r/genre = $g AND $r/title = $s/titleRETURN <film> { $r/title } { $r/rating } { $s/theater } </film>
User Bob wants to see a movie tonight.
Bob visits his favorite portal, BobsPortal.com.
Bob uses GUI front-end to come up with an XML query:
Three XML documents: film reviews, preferences, and film showings.
[2]
A query example (cont’)
The logical query planThree elements: Regular query operators: select, join
Pseudo-operator: document, display
References to XML fragments
Query processing: logical query plan
physical query plan
query processing
executed
algorithm
[2]
Advent of Mutant Query Plan• Why is MQP?
can cope with incomplete metadata can decentralize query optimization and execution Respect the autonomy and the local policies of sites Adapt to server and network conditions even while
being evaluated
• What is MQP? – An algebraic query plan graph, encoded in XML
• References to resource locations (URLs) • References to abstract resource names (URNs) • Verbatim XML fragments
– Each MQP is tagged with a target once the MQP is fully evaluated.
Mutant Query Processing
[1]
Mutant Query Plan Example
Garage Sale example:
Query: CDs for $10 or less in the Portland area.
MQP:
Regular query operators: select, join
Pseudo-operator: display
Constant piece of XML
URNs
[1]
Mutant Query Plan Example (cont’)
(a) Resolution and rewriting (b) reduction
[1]
Comparisons between Pipelined plan and Mutant plan
(a) Pipelined plan (b) mutant plan
[2]
Distributed Catalogs• Question: ? how do peers find out resources
available in other peers? Build distributed catalogs to efficiently
route queries • Procedures:
– Peers use multi-hierarchic namespaces to categorize data;
– Data providers use multi-hierarchic namespaces to describe data they serve;
– Data consumers use them to formulate queries.
Multi-hierarchic Namespaces
Multi-hierarchic namespace: The set of categorization hierarchies relevant to an applications domain. [1]
Interest area:
Second-hand armchairs in the Portland area:
[USA/OR/Portland, Furniture/Chairs]
A multi-hierarchic namespaces with two categorization dimensions and two highlighted interest areas: (a) Vancouver-Portland furniture, (b) items in Portland
[1]
Peer Roles
Resource Resolution• Authoritative Server
– Strives to know about all base servers within its interest area.
– Through an authoritative index or meta-index server, the known base servers in a particular interest area can be found out.
• Resource Resolution1. Seeks authoritative index or meta-index server 2. Recursively follows the index references 3. Finds all the relevant base servers and data items4. Resolves URN
Example of Resource Resolution
• Urn: ForSale: Portland-CDs• urls: http://10.1.2.3.9020/, http://10.2.3.4.9020/ • Interest area: [USA/OR/Portland, Music/CDs]• Authoritative meta-index server A :[USA, *]• Index Server B: [USA, Music]• Index Server C: [USA/OR, Music]• Index Server G: replace URN with URLs
Query plan A B C … G http://10.1.2.3.9020/
http://10.2.3.4.9020/
Intentional Statements• Purposes:
– How can index and meta-index servers convey the relationships between the data they cover?
– How can mutant queries use this information to make intelligent choices about completeness, currency and latency tradeoffs?
• Intentional Statements: – used to describe relationships between index and meta-index
servers, can be expressed using coordination formulas.
Server R replicates everything from server S for the Portland category of the Location hierarchy
Only Oregon sporting goods information that R holds is for Portland and Eugene golf clubs at S
R index several base servers
base[Portland, *]@R = base[Portland, *]@S
base[Oregon, Sporting Goods]@R = base[Portland, Golf Clubs]@S base[Eugene, Golf Clubs]@S
Index[Oregon, Golf Clubs]@R = base[Oregon, Golf Clubs]@S Base[base[Oregon, Golf Clubs]@T base[base[Oregon, Golf Clubs]@U
Utilizing Intentional Statements (cont’)
• Processes:– Whenever a server registers an interest area with the
meta-index server, it provides intentional statements – Servers can then use such information in binding and
routing MQPs.
Assumptions:
Meta-index server M knows about servers R and S
Interest areas: R [Portland, Recreation] S [Oregon, Sporting Goods]
M receives an MQP that contains the resource name [Portland, Golf Clubs]
Then the name could be bound to: base[Portland, Golf Clubs]@R base[Portland, Golf Clubs]@S
If M knows the intentional statement, base[Portland, Sporting Goods]@R = base[Portland, Sporting Goods]@S
then it could bind to: base[Portland, Golf Clubs]@R | base[Portland, Golf Clubs]@S
Conclusion: the MQP could be routed to either R or S, but it need not go to both.
Utilizing Intentional Statements (cont’)
For queries run not instantly:Suppose: Server R replicates everything for Portland at S, also possibly keeps additional data about Portland, can be up
to 30 minutes out of dateR polls every 30 minutes to update the data it replicates from S.Intentional Statement: base[Portland, *]@R ≥ base[Portland, *]@S{30}A binding for resource [Portland, CDs] might then be: base[Portland, CDs]@R{30} | (base[Portland, CDs]@R base[Portland, CDs]@S){0}Explanations:One can get an answer quickly by just routing the MQP to R, but that answer could be up to 30 minutes out of
date.By routing the MQP to both R and S, one can have a complete and current answer.
Conclusions:– Impossible to guarantee queries run instantly – Compromises on latency, completeness and currency. – Replication can’t be both scalable and instantaneous.
What else could be in MQPs
• Accumulating catalog and statistics information
• Maintaining provenance– Rewards system– Meta-index updating– Detection of spoofing
Security and Privacy• Issues:
– With MQPs, the partial results is possibly divulged to other undesirable servers
• Solutions:– MQPs need to incorporate ordering and
transfer policies– Encrypts data or data elements with the
public key– MQPs can allow to obtain answers under
given server security policies
Conclusions• Enable peers to independently
optimize and partially evaluate queries without global knowledge, and with a minimum of coordination overhead.
References
• [1] Vassilis Papadimos, David Maier and Kristin Tufte. Distributed Query Processing and Catalogs for Peer-to-Peer Systems. OGI School of Science Engineering. Oregon Health Science University.
• [2] V. Papadimos and D. Maier. Distributed Queries without Distributed State. In Proc. of WebDB 2002, pages 95-100.
Thanks!
Questions?...