november 22, 2003 bci 2003 aristotle university of thessaloniki 1 updating web views distributed...
Post on 21-Dec-2015
216 views
TRANSCRIPT
BCI 2003 Aristotle University of Thessaloniki 1
November 22, 2003
Updating Web views distributed over wide area networks
Sidiropoulos AntonisKatsaros Dimitrios
Aristotle Univ. of Thessaloniki, Greece
Presentation by:Katsaros Dimitrios
BCI 2003 Aristotle University of Thessaloniki 2
November 22, 2003
Content Distribution Networks
INTERNET2
1
Origin Web server
Web client
4
4
1
3
2
3
CDN Cache Servers
BCI 2003 Aristotle University of Thessaloniki 3
November 22, 2003
Content Distribution Networks
• Advantages– prevention of the flush crowd problem– avoidance of network congestion– reduction of user-perceived latency
• e.g., Akamai– launced in early 1999– 12,000 servers– in 1,000 networks
BCI 2003 Aristotle University of Thessaloniki 5
November 22, 2003
• Related work & Motivation• Proposed method• Preliminary performance
evaluation• Conclusions & Future work
Outline
BCI 2003 Aristotle University of Thessaloniki 6
November 22, 2003
• Related work & Motivation• Proposed method• Preliminary performance
evaluation• Conclusions & Future work
Presentation Outline
BCI 2003 Aristotle University of Thessaloniki 7
November 22, 2003
• Lack of bandwidth to disseminate all updates
• Many caches• Single point of updates
generation
Best-effort cache coherency
BCI 2003 Aristotle University of Thessaloniki 8
November 22, 2003
• Static Web object caching/prefetching (Katsaros & Manolopoulos, ACM SAC’04)(Nanopoulos, Katsaros & Manolopoulos, IEEE TKDE’03)
• Dynamic Web object caching/prefetching– cache plays the central role i.e., prefetching (Cho & Garcia-Molina, SIGMOD’00)
and (Gal & Eckstein, J.ACM’01)– minimizing the bandwidth consumption and query latency in the presence of
constraints on the age or accuracy of cached objects (Bright & Raschid, VLDB’02; Cohen & Kaplan, Computer Networks’02; Olston & Widom, SIGMOD’01)
– strong cache coherence maintenance (Challenger, Iyengar & Dantzig, INFOCOM’99)
– update dissemination, best-effort but with a single cache (Labrinidis & Roussopoulos, VLDB’01)
– caches and sources cooperate, best effort caching, (Olston & Widom, SIGMOD’02)
– optimal tranmission of updates, but fixed assumptions about update rates and transmission capabilities (Wang, Evans & Kwok, Information Systems Frontiers,’03)
Related work
BCI 2003 Aristotle University of Thessaloniki 9
November 22, 2003
• Related work & Motivation• Proposed method• Preliminary performance
evaluation• Conclusions & Future work
Presentation Outline
BCI 2003 Aristotle University of Thessaloniki 10
November 22, 2003
Web object freshness
Freshness of object O over period [ti,tj] Freshness of database D with N objects
BCI 2003 Aristotle University of Thessaloniki 11
November 22, 2003
• The access pattern of Web objects is skewed
• Objects with higher access rates contribute more to what is perceived as database freshness
• For a database with N objects Oi each with popularity fOi the freshness is defined as :
Weighted Web object freshness
BCI 2003 Aristotle University of Thessaloniki 12
November 22, 2003
• Devise a sequence of update disseminations so as to maximize F(D,T)
• Hence: The “best-effort” cache coherence maintenance is a nonpreemptive
scheduling problem
Maintain best-effort coherency
BCI 2003 Aristotle University of Thessaloniki 13
November 22, 2003
FIFO scheduling
• Assume that there are sufficient – network resources
– processing resources
• Use of the FIFO scheduling (First-Come-first-Served)
• Visualize our scheduling problem with the 2-dimensional Gantt charts (Goemans & Williamson, SIAM Journal on Discrete Mathematics’00)
BCI 2003 Aristotle University of Thessaloniki 14
November 22, 2003
• We have three pending refreshes in the server's queue, i.e., Refresh1, Refresh2 and Refresh3, which occurred with the order mentioned
Example of updates
Total cost Popularity
Refresh1 4 5
Refresh2 3 4
Refresh3 1 2
BCI 2003 Aristotle University of Thessaloniki 15
November 22, 2003
2-D Gantt chart for FIFO
popu
lari
ty
2
8
11
6
8
4
2
64
cost
1
2
3
Divergence = 1 - Freshness = Area under the thick polygonal line = 64
BCI 2003 Aristotle University of Thessaloniki 16
November 22, 2003
Can we do better ?
popu
lari
ty
2
8
11
6
8
4
2
64
cost
1
2
3
BCI 2003 Aristotle University of Thessaloniki 17
November 22, 2003
Can we do better ?
popu
lari
ty
2
8
11
6
8
4
2
64
cost
1
2
3
BCI 2003 Aristotle University of Thessaloniki 18
November 22, 2003
Yes ! Schedule the max(pop/cost)
Divergence = 1 - Freshness = Area under the thick polygonal line = 58 (10% gains even for this small example)
popu
lari
ty
2
8
11
6
8
4
2
64
cost
1
2
3
pop/cost
Refresh1 5/4=1,25
Refresh2 4/3=1,33
Refresh3 2/1=2
BCI 2003 Aristotle University of Thessaloniki 19
November 22, 2003
• Select for dissemination the update with the largest popularity/cost ratio
• It can be proved that this rule is optimal• No longer optimal in the presence of
dependencies• Very efficient heuristic even when there
exist dependencies
Largest Slope Rule scheduling
BCI 2003 Aristotle University of Thessaloniki 20
November 22, 2003
• Related work & Motivation• Proposed method• Preliminary performance
evaluation• Conclusions & Future work
Presentation Outline
BCI 2003 Aristotle University of Thessaloniki 21
November 22, 2003
Simulated System Hardware
MasterCDN
CDN server n
Routers/Gateways
Parasol NodeParasol CPUParasol Network Link
RouterRouter
Router
RouterRouterRouter
CPU:2 CPU:1
CPU:0
CDN server 1 CDN server 2
BCI 2003 Aristotle University of Thessaloniki 22
November 22, 2003
Simulated System Model
DispatcherScheduleralgorithm
Relation updates
DBMS
ViewUpdater
CDN1updater
CDN2updater
CDNnupdater
CDN1 CDN2 CDNn
DB updates
Request for view update
Master CDN
1
2 3
4
5 6
BCI 2003 Aristotle University of Thessaloniki 23
November 22, 2003
masterCDN components
DBMS
CPU:1ViewUpdater
Node:MasterCDN
CPU:0DispatcherCPU:2
Pool of views to
be updated
Scheduler
algorithm
CDN1updater
Pool of
views to
transmit CDN2
updater
Pool of views
to transmi
t CDNnupdater
Pool of
views to
transmit
Rel. Q
ueue
Relation update
BCI 2003 Aristotle University of Thessaloniki 24
November 22, 2003
• Synthetic (sample CDN with 10 edge servers)– Synthetic data generator
•Modeling network nodes, network bandwidth, size of documents, relations, views, view derivation hierarchy, update rates, popularity
• Examine the impact of:– update rate– number of relations
Methodology
BCI 2003 Aristotle University of Thessaloniki 30
November 22, 2003
Freshness vs. (#Rel, dep_density)
Top: 100 Rels
Botom: 500 Rels
Left: Sparse dep. Right: Dense dep.
BCI 2003 Aristotle University of Thessaloniki 31
November 22, 2003
• Related work & Motivation• Proposed method• Preliminary performance
evaluation• Conclusions & Future work
Presentation Outline
BCI 2003 Aristotle University of Thessaloniki 32
November 22, 2003
• Conclusions– we proposed a best-effort cache coherence maintenance
scheme for the edge servers of a CDN– it is a pure push-based dissemination method– the scheme is based on the LSR scheduling algorithm– we presented preliminary results to justify its efficiency
• Future work– Organize the edge serves into a (possibly) deep hierarchy,
so as to parallelize the update dissemination
Conclusions & Future work
BCI 2003 Aristotle University of Thessaloniki 33
November 22, 2003
1. L. Bright and L. Raschid, Using Latency-Recency Profiles for Data Delivery on the Web, Proc. of the VLDB, pp. 550-561, 2002.
2. J. Challenger, A. Iyengar, and P. Dantzig, A Scalable System for Consistently Caching Dynamic Web Data, Proc. of the IEEE INFOCOM, 1999.
3. J. Cho and H. Garcia-Molina, Synchronizing a Database to Improve Freshness, Proc. of the ACM SIGMOD, pp. 117-128, 2000.
4. E. Cohen and H. Kaplan, Refreshment Policies for Web Content Caches, Computer Networks, 38(6), 795-808, 2002.
5. A. Gal and J. Eckstein, Managing Periodically Updated Data in Relational Databases: A Stochastic Modeling Approach, Journal of the ACM, 48(6), pp. 1141-1183, 2001.
6. M.X. Goemans and D.P. Williamson, Two-Dimensional Gantt Charts and a Scheduling Algorithm of Lawler, SIAM Journal on Discrete Mathematics, 13(3), pp. 281-294, 2000.
7. D. Katsaros and Y. Manolopoulos, Caching in Web Memory Hierarchies, Proc. of the ACM SAC, 2004.
8. A. Labrinidis and N. Roussopoulos, Update Propagation Strategies for Improving the Quality of Data on the Web, Proc. of the VLDB, 2001.
9. A. Nanopoulos, D. Katsaros and Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Trans. on Knowledge and Data Engineering, 15(5), pp.1155-1169, 2003.
10. C. Olston and J. Widom, Adaptive Precision Setting for Cached Approximate Values, Proc. of the ACM SIGMOD, pp. 355-366, 2001.
11. C. Olston and J. Widom, Best-Effort Cache Synchronization with Source Cooperation, Proc. of the ACM SIGMOD, pp. 73-84, 2002.
12. J.W. Wang, D. Evans and M. Kwok, On Staleness and the Delivery of Web Pages, Information Systems Frontiers, 5(2), pp. 129-136, 2003.
References
BCI 2003 Aristotle University of Thessaloniki 34
November 22, 2003
Sidiropoulos AntonisDept. of InformaticsAristotle UniversityThessaloniki, 54124, [email protected]://users.auth.gr/~asidirop
Katsaros DimitriosDept. of InformaticsAristotle UniversityThessaloniki, 54124, [email protected]://skyblue.csd.auth.gr
Contact information