![Page 1: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/1.jpg)
A Decentralized Relational Information Service for Large Scale Distributed
ComputingThesis ProposalApril 2nd, 2004
Dong Lu
Committee
Peter A. Dinda (Chair) Fabian E. Bustamante
Yan Chen Ian Foster (UC and ANL)
![Page 2: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/2.jpg)
2
What is information service?
• An information service stores information about the resources and services in a distributed computing environment and answer queries about it.
• GIS is information service for Grid computing. MDS2 is an example of GIS
![Page 3: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/3.jpg)
3
What is RGIS?
• RGIS: Relational Grid Information Service
• RGIS is a decentralized relational information service that is being built on top of distributed and replicated relational data model
![Page 4: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/4.jpg)
4
Why RGIS?1. RGIS can answer complex compositional
queries• Relational algebra (SQL)• Joins
• Difficult in a hierarchical model (directory service)
2. Other reasons• Indexes separate from data model• Schema evolution • Transactional insert/update/delete• Consistency
![Page 5: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/5.jpg)
5
Example Queries and Updates
• “Find me four Xeon machines with a total of 8 GB of RAM within 5 seconds”
• “Inform all my friends that the machine dualsword now has 2 GB of RAM within 500 seconds”
![Page 6: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/6.jpg)
6
Thesis Statement• A centralized relational information server, such as our
current RGIS system, can’t scale with the distributed computing environment. How can we build a scalable distributed relational information service with query and update constraints?
We have addressed query constraints by developing query techniques on individual servers to trade off the query time with the size of the result set. We have developed infrastructure for RGIS to support replication through update push.
I propose to address update constraints, namely bounds on replica staleness. This will be built on the basis of predictive techniques for statistical quality of service (QoS) for single and parallel end-to-end TCP transfers.
![Page 7: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/7.jpg)
7
Related works• MDS2: OpenLDAP based information service.
Part of the globus tool kit• R-GMA: Another relational data model for GIS
that focused on dynamic properties of resources • MatchMaker: classified advertisements
(classads), part of the condor system• Redline: A language that enables the definition
of Constraint Satisfaction Problems (CSP) and then apply heuristics to solve the NP-hard CSP problems
• etc.
![Page 8: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/8.jpg)
8
Outline• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint • Schedule
![Page 9: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/9.jpg)
9
Motivation to build a distributed information service
1. A centralized server can’t scale with a distributed system and number of users
• CPU, memory and disk can easily become performance bottleneck
2. Even if we host the service using a high performance cluster, the outgoing bandwidth can easily become performance bottleneck
![Page 10: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/10.jpg)
10
Thesis Challenges1. Complex queries sometimes take a long time
to finish • We have proposed and implemented scoped,
approximate and nondeterministic query techniques to address this challenge
• We have evaluated them using GridG
2. How to maintain proper consistency among the replicated databases?
• I am proposing to maintain soft real-time bounded weak consistency among the servers
This Challenge Has Been Addressed
![Page 11: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/11.jpg)
11
Thesis Challenges
3. How to bound the weak consistency with real time?
• I am proposing to monitor the overlay links to provide soft QoS to data transfer and then send updates to other replicas so that the consistency can be time bounded
4. How to provide soft statistical QoS to data transfer on the Internet? (TCP)
• I am proposing to develop a novel TCP throughput benchmarking technique and then build statistical QoS on the basis of prediction
![Page 12: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/12.jpg)
12
Outline
• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint• Schedule
![Page 13: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/13.jpg)
13
RGIS Model of a Gridmodule
endpoint
maclinkmacswitch
iplinkrouter
host
connectorswitch
connectorlink
• Annotated network topology graph
• Annotation examples– Hosts: memory, disk, OS,
NICs, etc.– Router/Switch: backplane
bandwidth, ports– Link: latency and
bandwidth• Virtualization, Futures,
Leases– Virtual machines
Network
Data link
Physical
Software
[SC03-1]
![Page 14: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/14.jpg)
14
RGIS Architecture
Oracle 9i Back EndWindows, Linux, Parallel Server, etc
Oracle 9i Front Endtransactional inserts and updates
using stored procedures, queries using select statements
(uses database’s access control)
UpdateManager
Web Interface
Content Delivery Network Interface
For loose consistency
Query Managerand Rewriter
Users
Schema, type hierarchy, indices,PL/SQL stored procedures
for each object
Applications
RDBMSUse of Oracle
is not a requirement of approach
site-to-site (tentative)
Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access
Authenticated Direct Interface
SOAP Interface
[SC03-1]
Developers:Lu, Dinda, Weinrich,Lange
![Page 15: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/15.jpg)
15
RGIS Design (Intersite)
RGIS Server RGIS Server
RGIS Server
Update Push ToFriend Site
Update Push ToFriend Site
•Site RGIS server pushes local updates to friend sites
•Site RGIS server consolidates updates from site and friend sites
•Site RGIS server answers all queries originating from its site
A B
C
![Page 16: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/16.jpg)
16
Outline
• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint Control• Schedule
![Page 17: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/17.jpg)
17
GridG: A Synthetic Grid Generator• Why GridG?
– Evaluation of RGIS query performance; distributed systems simulation, etc..
• Output: Network topology annotated with the hardware and software on each node and link – Layer 3 network: hosts, routers, links– Hosts: memory, architecture, number of CPUs, disk,
operating system, vendor, clock rate– Routers: switching capacity– Links: bandwidth and Latency
[SC03-2, SIGMETRICS PER]
![Page 18: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/18.jpg)
18
Related work: current graph generators
• Random: Waxman, etc.
• Hierarchical : Tiers, Transit-Stub, etc.
• Degree-based: Inet, Brite, etc.
1. GridG is the first topology generator that has a clear three-level hierarchy and also follows power law of Internet topology
2. GridG is the first generator that can annotate the hosts, routers and links with reasonable properties
![Page 19: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/19.jpg)
19
Quick review of the Power laws of Internet topology
Power Laws Expression
Rank exponent
Outdegree exponent
Eigen exponent
Hop-plot exponent
Rvv rd
Od df
ii HhhP )(
![Page 20: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/20.jpg)
20
GridG ExampleRouter (switching capacity)
Host (arch, numcpu, clock rate, osvendor, mem, disk)
Link (bw, latency)
![Page 21: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/21.jpg)
21
Requirements for GridG
• Realistic topologies– Connected– Hierarchical structure– Power laws of Internet topology
• Realistic annotations – Distributions of attributes– Correlations of attributes
Intra-hostInter-host
![Page 22: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/22.jpg)
22
GridG architecture
• A sequence of transformations on a text-based representation of an annotated graph.
Other transformationson common format(Cluster maker, etc)
Structured TopologyBase
TopologyGenerator
(Tiers)
TranslationTo
CommonFormat
GridGPowerLaw
Enforcer
Structured Topologythat obeys power laws
Grid
GridGAnnotator
GISSimulator
DOTVisualization
OtherTools
RGISDatabase
![Page 23: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/23.jpg)
23
Topology generation (Published on ACM Sigmetrics Performance Evaluation Review)
• GridG follows the power laws and has a clear three level hierarchical structure
• We propose the following as the relationships among Internet topology power laws
New rank law Outdegree power law
Eigenvalue law
![Page 24: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/24.jpg)
24
Assumed Dependence Tree
Graph annotation (Complete GridG paper published on SC’03)
The dependence tree is transformed into conditional probability in the implementation of GridG
![Page 25: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/25.jpg)
25
GridG V1.0 release
• http://www.cs.northwestern.edu/urgis/GridG
![Page 26: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/26.jpg)
26
Outline
• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint• Schedule
![Page 27: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/27.jpg)
27
SQL Example of Cluster Finder Query
SELECT [scoped-approx] h1.distip, h2.distip FROM hosts h1, hosts h2, iplinks l1, iplinks l2, routers r WHERE h1.mem_mb+h2.mem_mb>=1024 and h1.os='linux' and h2.os='linux' and ((l1.src=r.distip and l2.src=r.distip and l1.dest=h1.distip and l2.dest=h2.distip) or (l1.dest=r.distip and l2.dest=r.distip and l1.src=h1.distip and l2.src=h2.distip)) and h1.distip<>h2.distip and L1.BW_MBS >= 100 AND L2.BW_MBS >= 100[SCOPED BY r.distip=X]WITHIN 100 seconds;
![Page 28: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/28.jpg)
28
Scoped and Approximate query techniques (published on 4th International workshop on
Grid computing)
• Scoped query: all the joins are limited to a neighborhood in the network, exploiting the network topology captured in the RGIS system.
• Approximate query: the number of joins is reduced by replacing them with constraints on individual objects and the simplified query is run against the entire network.
![Page 29: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/29.jpg)
29
Nondeterministic query technique (Published on SC’03)
• Non deterministic query: a random subset of the network objects are chosen to conduct joins
• Another mechanism to trade off query time with query result set
• All the three techniques, namely, non-deterministic, scoped and approximate queries can be time bounded
![Page 30: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/30.jpg)
30
Summary of query techniques
All results
Scopedresults
Nondeterministicresults
Approximateresults
![Page 31: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/31.jpg)
31
Outline• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint• Schedule
Proposed work
Finished work
![Page 32: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/32.jpg)
32
Update Problem
• “Inform my friends that the machine dualsword now as 2 GB of RAM within 500 seconds”
• Update Push
• How do I make the whole push operation run within the time bound given dynamic network conditions?
Proposed work
![Page 33: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/33.jpg)
33
Why do we need statistical QoS for data transfers on the Internet?
• To bound the data propagation time among the RGIS servers, we need soft deadline for data transfers
–But reservations typically unavailable
–Adapt to changing networkParallel TCP, Overlay Multicast
–Inform user when request is impossible
• Statistical QoS is a soft guarantee: To meet deadline with specified high probability. It is prediction based
![Page 34: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/34.jpg)
34
Related works
• Available bandwidth estimation: packet pair, cprobe, pathchar/pchar, nettimmer, pathload, NCS, pathrate, spruce, remos, etc.
• TCP benchmarking: NWS, etc. can provide real time TCP throughput prediction
However, available bandwidth differs significantly from TCP throughput that applications can achieve.
However, recent research by Sudharshan et. al showed that simple TCP benchmarking can’t predict large file transfers well.
![Page 35: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/35.jpg)
35
Related works• Resource ReserVation Protocol: RSVP
needs cooperation from routers. However, routers on a End-to-End path belong to different ISPs, thus it is hard to use in practice
• Network reservation based QoS: GARA is one example.
• Service Level Agreement (SLA): It is hard to make SLAs for End-to-End paths
![Page 36: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/36.jpg)
36
Prediction based statistical QoS for data transfers on the Internet
• Main idea: predict TCP data transfer time with confidence interval
• Challenges: – Simple TCP benchmarking techniques failed to
predict TCP throughput for large file transfers. – Internet is dynamically changing. How can we
capture the dynamics on the End-to-End path?
Proposed work
![Page 37: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/37.jpg)
37
Observations
File Size and TCP Throughput are strongly correlated
Steady State Throughput
![Page 38: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/38.jpg)
38
Probe Pair: a new TCP benchmarking technique
Proposed work
Why simple TCP benchmarking fails?
![Page 39: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/39.jpg)
39
Experimental methodology
• Purpose: To study correlation between TCP throughput and flow size, and evaluate proposed TCP benchmark mechanism
• Testbed: 40 PlanetLab nodes in North America, Europe, Asia, and Australia. Repeat random pairing 3 times, 60 distinctive paths total. 2,430,000 TCP transfers
• TCP Flow size: 100 KB, 200 KB, 400 KB , 600 KB, 800 KB, 1 MB, 2 MB, 4 MB, 10 MB, (up to 1GB in other experiments)
![Page 40: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/40.jpg)
40
Verification of Probe Pair(CDF of prediction error)
![Page 41: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/41.jpg)
41
Internet stability
• Routing stability (Fundamental): – Paxson’s work show that Internet paths are heavily
dominated by a single route
• Spatial locality and temporal locality of end-to-end TCP throughput: – Balakrishnan, et al showed that nearby Internet hosts
often have almost identical distributions of observed throughput to a remote web server
– Balakrishnan, et al also showed that End-to-End TCP throughput are stationary on the scale of tens of minutes. And lognormal distribution can be used to model the End-to-End TCP throughput
![Page 42: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/42.jpg)
42
Capturing transient Internet stability• Given the strong correlation between TCP flow size
and throughput, what could be the proper model for End-to-End steady state TCP throughput?– lognormal is a good model for aggregated TCP
throughput on a given path, namely, throughput with different TCP flow sizes
– What is the proper model for steady state TCP throughput distribution?
![Page 43: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/43.jpg)
43
Capturing transient Internet stability
• We define Statistical Stable Region (SSR) as the length of a period of time where the ratio between maximum and minimum estimated steady state TCP throughput is less than a constant factor
• With extensive Internet measurement study, we found that normal distribution can be used to model TCP throughput within each SSR
![Page 44: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/44.jpg)
44
Capturing transient Internet stability
![Page 45: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/45.jpg)
45
Capturing transient Internet stability
![Page 46: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/46.jpg)
46
dualPats: predicting TCP throughput with small probe pairs
We build “dynamic sampling rate adjustment algorithm” to capture the End-to-End TCP throughput dynamics and therefore to minimize probing overhead in dualPats.
Proposed work
![Page 47: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/47.jpg)
47
Parallel TCP throughput prediction
• Parallel TCP is widely used in distributed computing, GridFTP is one example
• How can we predict parallel TCP throughput without being intrusive to the network?
Proposed work
![Page 48: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/48.jpg)
48
Prediction Example
![Page 49: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/49.jpg)
49
Outline
• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint • Schedule
![Page 50: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/50.jpg)
50
Related works• Strong consistency: a correctness criteria for
traditional replicated transactional databases
• Weak consistency: examples of distributed systems that greatly favor performance over consistency include Coda, Bayou, etc.. There is no bound on inconsistency in such systems
• TACT is a distributed system with adjustable consistency bounds among the replicas. But the TACT system focused on logical time bounds
![Page 51: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/51.jpg)
51
Consistency Constraints
• Strong consistency is hard,if possible at all, for the distributed systems
• Weak consistency with time bound is required for RGIS: any local update will be propagated to all friendly RGIS servers within time T
Proposed work
![Page 52: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/52.jpg)
52
Proposed approach• Monitoring the overlay links to predict the
data transfer time• Finish data propagation within time T with
high probability• Use application level multicast to enhance
efficiency
• Evaluation: Synthetic updates will be used for evaluation. One possible way is to use the GIS benchmark proposed at Indiana University.
Proposed work
![Page 53: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/53.jpg)
53
Alternative approach
• For better scalability, history based prediction can be combined with the overlay monitoring for the prediction.
Proposed work
![Page 54: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/54.jpg)
54
Outline
• Motivation• Challenges • System Architecture• GridG for query evaluation • Query techniques on a single server• Providing statistical QoS to data
transfers on the Internet• Update Consistency Constraint• Schedule
![Page 55: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/55.jpg)
55
Proposed schedule
• Statistical QoS for data transfer on the Internet: Proposed completion date: April, 2004
• Consistency constraints:
Proposed completion date: October, 2004
• Integrate RGIS system and evaluation: Proposed completion date: January, 2005
• Finish writing dissertation:
Proposed completion date: May, 2005
![Page 56: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/56.jpg)
56
Acknowledgement
• Jack Lange, Yi Qiao, Jason Skicewicz, Andrew Weinrich.
![Page 57: Dong Lu Committee Peter A. Dinda (Chair) Fabian E. Bustamante](https://reader035.vdocument.in/reader035/viewer/2022062422/56813ff1550346895dab07a3/html5/thumbnails/57.jpg)
57
Thesis Statement• A centralized relational information server, such as our
current RGIS system, can’t scale with the distributed computing environment. How can we build a scalable distributed relational information service with query and update constraints?
We have addressed query constraints by developing query techniques on individual servers to trade off the query time with the size of the result set. We have developed infrastructure for RGIS to support replication through update push.
I propose to address update constraints, namely bounds on replica staleness. This will be built on the basis of predictive techniques for statistical quality of service (QoS) for single and parallel end-to-end TCP transfers.