mariposa: a wide-area distributed database system
DESCRIPTION
Mariposa: a wide-area distributed database system. Kumar Ramdurgkar. CIS 661. Mariposa Distributed Database Management System. Principal Investigator: Prof. Michael Stonebraker. SECTION 1. Introduction to Mariposa. LAN Vs WAN databases. - PowerPoint PPT PresentationTRANSCRIPT
Mariposa: a wide-area distributed database system
Kumar Ramdurgkar.
CIS 661
Mariposa Distributed Database Management System
Principal Investigator: Prof. Michael Stonebraker
SECTION 1Introduction to Mariposa
LAN Vs WAN databases LAN database management is common
most often used in industries where the data is local to the installation.
LAN has a single RDBMS source. LAN is maintained by a well defined set of
rules, data types, and services.
The difference ?
WAN Databases Many databases interconnected over a
WAN In WAN there are many sites participating
in the DBMS Different site administrators. Different data types, extensions
and service handling times. How do we interconnect ? What are the issues ?
Issues and problems Network connections and traffic. Different ‘load’ handling capabilities and
service times. Different data type and extensions. A single program acting as a query
optimizer will NOT work
continued…
Issues and problems Cost based optimization does not respond
well to site specific type extensions and access constraints, charging algorithms and time-of-day constraints.
No proper scaling for LAN algorithms to suite WAN DBMS
The Solution…
An excellent idea ! MARIPOSA UBID !! Have you been there ??
The Mariposa is a distributed DBMS working on the economic paradigm of Bidding.
Mariposa was proposed by: Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu
Proposed: Nov 1994 Accepted: Sept 1995
Mariposa… vision Standard approach for distributed data. A set of standard guidelines for WAN
databases. Application of query storage and
optimization using a different perspective. Scalability and data explosion handling. A query optimizer for the WWW ??
Need to formalize
WAN Guidelines for Mariposa Scalability to a large number of
cooperating sites. Data mobility. No global synchronization of data. Total local autonomy and complete
control. Easily configurable policies for changing
the behavior of Mariposa.
Mariposa System architecture Microeconomic mechanisms. All Mariposa clients and servers have a
account with a network bank. A user allocates a budget in the currency
of this bank to each query. The goal of the query processing system
is to solve the query within the allotted time by contracting various Mariposa clients.
Mariposa Broker mechanism Obtain bid pieces for a query from sites. Uses a distributed advertising system as
over the usual META – DATA mechanisms used in LAN.
The server who has advertised the best time for the given query wins.
Scalability Site can join Mariposa by buying ‘objects’
and advertising services Site can leave Mariposa by selling objects
and by ceasing to bid. Hence a highly scalable system.
Infact the success of Mariposa depends on a large number of sites participating in the system.
Storage decisions Objects have no notion of home. All secondary indices are moved with the
objects. Avoidance of global sync is simplified
because of the economic paradigm. Mariposa fosters data mobility and free
trade of objects Object here means ‘data’
Total local control Since each Mariposa site is free to bid on
any business of interest, it has total local autonomy.
Each site is expected to maximize its individual profit per unit of operating time and to bid on those queries that it feels will accomplish this goal.
Sounds good… any drawbacks ?? Some queries may not be solvable either
because nobody will bid on them or the minimum bids exceeds what the client is willing to pay.
A site can refuse to give up objects A site may not find buyers for objects that
it wants to sell.
SECTION 2Mariposa architecture
Mariposa Architectural details Hardware Flow chart Processes (bidding, bid protocols,
acceptance, finding bidders, sub–query bidding, network bidding, splitting and combining)
Code languages (RUSH) Mariposa experiments and results Conclusions
Client query in SQL3 Middleware consists
of several query separator and query broker.
Broker and Bidder coded in RUSH.
Local execution at the site that wins the bid.
Details…
Architecture overview
Architecture details
Processes : Bidding
Each query Q has a budget B(t) that can be used to solve the query
The budget is a value the user gives to solve this query.
Broker receives query plan for Q and tries to bid and solve each fragment using either the expensive bid protocol or a cheaper purchase order protocol.
Processes : Bidding
Brokers split each query into sub queries and bid for each sub query
There is a set sequence of sub query execution.
Finding the right winners is implemented in a greedy algorithm at the broker.
Processes : Bid Protocols The expensive bid protocol has 2 phases:
Broker sends requests and Bidder sends back triplet value (Ci, Di, Ei) indicating cost Ci for Delay of Di and expiration of bid is Ei (for Qi)
The broker notifies winners (and losers). The purchase order protocol is faster and
involves the Broker sending the query to the site it is most likely to be processed. There is a risk that the query might not be processed in the given time.
Finding Bidders Brokers examine ‘Ad Tables’ to find out
the servers that are willing to perform the task at hand.
Using records in an Ad Table the server posts its bids.
Ad tables typically have the bidding information for the sample query structures run on that server.
Sample Ad Table design Not all fields might be used
Bidding strategies Bulk purchase contracts allowing lower
than normal bids (wholesale) Coupons Sale Broker intelligence (remember last
successful bid history and try that site query combination again)
Processes: Network Bidding Account for network bandwidth. Data size comes into the consideration. Minimum available bandwidth is calculated
from node to node. This bandwidth must be reserved to
achieve desired performance. Mariposa uses Telnet protocols RTIP and
RCAP for network bidding.
Coding (RUSH language) Mariposa provides a low level, very
efficient embedded scripting language and rule system called Rush
Using Rush, it is straightforward to change policy decisions; one simply modifies the rules by which these modules are implemented.
The Mariposa architecture is primarily coded in Rush.
SECTION 3Mariposa experiments and
results
Operational system Mariposa operational on Digital Equipment
Corp. Alpha AXP workstations. UC Berkeley,
The basic server engine is that of POSTGRES.
Implementation of the Rush language itself has required careful design and performance engineering.
Requirement of multithreaded network communication package.
Experiment setup Workstations connected by 10MB/s
ethernet WAN experiments conducted at night. The benchmark database consists of three
tables, R1, R2 and R3. The workload query is an equijoin of all
three tables:SELECT * FROM R1, R2, R3
WHERE R1.u1 = R2.u1
AND R2.u1 = R3.u1
In the wide area case, the query originates at Berkeley and performs the join over the WAN connecting UC Berkeley,UC Santa Barbara and UC San Diego.
Timing Results
Conclusions Mariposa, a prototype data management
system that unifies the best features of distributed operating system and distributed database management system research.
Distributed query optimization has been identified as an area that will receive a strong emphasis and we will also examine how to build a system that has a rule system at its core.
Conclusions Future work remains in the areas of
system robustness, distributed failure recovery, and performance assessment.
References Mariposa homehttp://s2k-ftp.cs.berkeley.edu:8000/mariposa/index.html
Thank you.