1 objective-optimal algorithms for long-term web prefetching bin wu & ajay kshemkalyani dept. of...
TRANSCRIPT
1
Objective-Optimal Algorithms for Long-term Web Prefetching
Bin Wu & Ajay Kshemkalyani
Dept. of Computer Science, Univ. of Illinois at Chicago
2
Outline
• Problem definition and background• Web prefetching algorithms • Performance metrics• Objective-Greedy algorithms (O(n) time)
– Hit rate greedy (also hit rate optimal)– Bandwidth greedy (also bandwidth optimal)– H/B greedy
• H/B-Optimal algorithm (expected O(n) time)• Simulation results• Conclusions
3
Introduction Web caching reduces user-perceived latency
– Client-Server mode– Bottleneck occurs at server side– Means of improving performance:
• local cache, proxy server, server farm, etc.
– Cache management: LRU, Greedy dual-size, etc.
On-demand caching vs. (long-term) prefetching– Prefetching is effective in dynamic environments.– Clients subscribe to web objects– Server “pushes” fresh copies into web caches– Selection of prefetched objects based on long-term
statistical characteristics, maintained by CDS
4
Introduction
• Web prefetching Caches web objects in advanceUpdated by web serverReduces retrieval latency and user access timeRequires more bandwidth and increases traffic.
• Performance metricsHit rateBandwidth usageBalance of the two
5
Object Selection Criteria
Popularity
(Access frequency)Lifetime Good FetchAPL
6
Web Object Characteristics
• Access frequencyZipf-like request model is used in web traffic
modeling.
The relationship between access frequency p and popularity rank i of web object:
i i
kwhereikp1
/1,/
7
Web Object CharacteristicsThe generalized “Zipf’s-like” distribution of web
requests is calculated as:
k is a normalization constant, i is the object ID (popularity rank), and α is a Zipf’s parameter:
0.986 (Cunha et al.),
0.75 (Nishikawa et al.) and
0.64 (Breslau et al.)
i i
kwhereikp 1
/1,/
8
Web Object Characteristics
• Size of ObjectsAverage object size:10–15 KB.No strong correlation between object size and its
access frequency.
• Lifetime of web objectsAverage time interval between updatesWeak correlation between access frequency and
lifetime.
9
Caching Architecture
• Prefetching selection algorithms use as an input these global statistics:– Estimates of object reference frequencies– Estimates of object lifetimes
• Content distribution servers cooporate to maintain these statistics
• When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it.
10
Solution space for web prefetching
• Two extreme cases:Passive caches (non-prefetching)
- Least network bandwidth and lowest cache hit rate
Prefetching all objects - 100% cache hit rate- Huge amount of unnecessary bandwidth
• Existing algorithms use different object-selecting criteria and fetch objects exceeding the threshold.
11
Steady State Properties• Steady state hit rate for object i
is defined as freshness factor, f(i)
• Overall hit rate:
• Especially,
(Venkataramani et al.)
1iiii
lap
lap
prefetchednotisiobject
lap
lap
prefetchedisiobjectiii
ii
h 11
i
iihpH
i
idemand ifpH )(
12
Steady State Properties
• Steady state bandwidth for object i
• Total bandwidth:
• Especially:
prefetchednotisiobjectsifap
prefetchedisiobjectl
siii
i
ib ))(1(
i
ibBW
i
iidemand sifapBW ))(1(
13
Objective Metrics
• Hit rate – benefit • Bandwidth – cost• H/B model – balance of benefit and cost
Basic H/B
Enhanced H/B
• (Jiang, et al.)
Demandefetching
Demandefetching
BWBW
HitHitBH
Pr
Pr
Demandefetching
kDemandefetchingk
BWBW
HitHitBH
Pr
Pr )(
14
Existing Prefetching Algorithms
• Popularity [Markatos et al.]Popularity [Markatos et al.]Keep the most popular objects in the systemUpdate these objects immediately when they changeCriterion – object’s popularityExpected to achieve high hit rate
• Lifetime [Jiang et al.]Lifetime [Jiang et al.]Keep objects with longest lifetimesMostly consider the network resource demands Threshold – the expected lifetime of objectExpected to minimize bandwidth usage
15
Existing Prefetching Algorithms
• Good Fetch [Venkataramani et al.]Computes the probability that an object is accessed
before it changes.Prefetch objects with “high probability of being
accessed during their average lifetime”
Prefetch object i if the probability exceeds threshold.Objects with higher access frequencies and longer
update intervals are more likely to be prefetchedBalance the benefit (hit rate increase) against the cost
(bandwidth increase) of keeping an object.
16
Existing Prefetching Algorithms
• APL [Jiang et al.]
Computes apl values of web objects.apl of an object represents “expected number of
accesses during its lifetime”Prefetch object i if its apl exceeds threshold.Tends to improve hit rate; attempts to balance benefit
(hit rate) against cost (bandwidth).
17
Existing Prefetching Algorithms
• Enhanced APLn>1, prefers objects with higher popularity (emphasize
hit rate)n<1, prefers objects with longer lifetime (emphasize
network bandwidth)
lapn
18
Objective-Greedy Algorithms
• Existing algorithms choose prefetching criteria based on intuitions
• These intuitions are not aimed at any specific performance metrics
• These intuitions consider only individual objects’ characteristics, not the global impact
• None of them gave optimal performance based on any metric– Simple counter-examples can be shown
19
Objective-Greedy Algorithms
• Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics.
• E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests.
20
H/B-Greedy Prefetching
• Consider the H/B value of on-demand caching:
• If object j is prefetched, then H/B is updated to:
i i
i
ii
demand
demand
demand ifls
ifp
BW
Hit
B
H
)(
)(
Si i
i
j
j
Sii
j
demand
j
j
Si i
i
Siji
ifl
s
jfl
s
ifp
jfp
B
H
jfl
sif
l
s
jfpifp
)(
))(1(
1
)(
))(1(1
))(1()(
))(1()(
21
H/B-Greedy Prefetching• We define
as the increase factor of object j, incr(j).
• incr(j) indicates the amount by which H/B can be increased if object j is selected.
Si i
i
j
j
Sii
j
ifls
jfl
s
ifp
jfp
)(
))(1(
1
)(
))(1(1
22
H/B-Greedy Prefetching
• H/B-Greedy prefetching prefetches those m objects with greatest increase factors.
• The selection is based on the effect on the hit rate by prefetching individual objects.
• H/B-Greedy is still not an optimal algorithm in terms of H/B value.
23
24
Hit Rate-Greedy Prefetching
• To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution:
• This algorithm is optimal in terms of hit rate.
1))(1()(_
ii
ii lap
pifpiContrHR
25
Bandwidth-Greedy Prefetching• To minimize the total bandwidth given m, the
number of objects to prefetch, we select the m objects with least bandwidth contribution:
• Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption.
iii
i
i
i
llap
sif
l
siContrBW
2))(1()(_
26
H/B-Optimal Prefetching
• Optimal algorithm for H/B metric provided by a solution to the following selection problem.
• This is equivalent to maximum weighted average problem with pre-selected items.
'
'
))(1()(
))(1()(
maxargmaxarg,','
'
Sj j
j
Si i
i
Sjj
Sii
mSSSprefmSSS jfl
sif
ls
jfpifp
B
HS
27
Maximum Weighted Average
Maximum Weighted Average Problem:• Totally n courses, with different credit hours and scores• select m (m < n ) courses• maximize the GPA of m selected courses
Solution:
• If m=1
Then select course with highest score
What if m>1? A misleading intuition: select the m courses with highest
scores.
28
A Course Selection Problem
• If m=2
If we select the 2 courses with highest scores: C and B.
then GPA: 93.33
But if we select C and D, then GPA: 93.57
• Question: how to select m courses such that the GPA is maximized?
Answer: Eppstein & Hirschberg solved this
Courses A B C D E F G HCredit
hours 5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0Scores 70 90 95 85 75 60 65 80
29
With Pre-selected items
Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores• Course A and E (for example) must be selected, plus:• Select additional m (m is given, m<n) courses, such that:
the resulting GPA is maximized
Courses A B C D E F G H
Credit
hours5.0 3.0 6.0 1.0 2.0 4.0 3.0 6.0
Scores 70 90 95 85 75 60 65 80
30
Pre-selection is not trivial
1) Selection domain B~I, no pre-selection, m=2optimal subset: {B,C}, GPA: 88.33
2) Selection domain B~I, A is pre-selected, m=2one candidate subset: {A,D,H}, GPA: 75.61better than: {A,B,C}, GPA: 70.625
Conclusion: {B,C} not contained in optimal subset for pre-selected problem.
Course A B C D E F G H ICredit 5.0 1.0 2.0 10.0 1.5 2.5 2.0 3.0 4.0Score 60 95 85 83 63 71 80 77 65
31
H/B-Optimal v.s. Course selection
• The problem is formulated as:
Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example.• Equivalent to H/B-Optimal selection problem:
'
'
0
0
,'
' maxargSj
j
Sjj
mSSS ww
vv
S
'
'
))(1()(
))(1()(
maxarg,'
'
Sj j
j
Si i
i
Sjj
Sii
mSSS jfl
sif
l
s
jfpifp
S
32
H/B-Optimal v.s. Course selection
33
H/B-Optimal algorithm design
• The selection of m courses is not trivial• For course i, we define auxiliary function
• And for a given number m, we define a Utility function
xm
ww
m
vvxr iii )()()( 00
'
',')(max)(
Sii
SSmSxrxF
34
H/B-Optimal algorithm
• Lemma 1
Suppose A* is the maximum GPA we are computing, then for any subset S’ S and |S|=m
Lemma 1 indicates that the optimal subset contains those courses that have the m largest ri (A*) values
.'0)().2
;0)().1
'
*
'
*
subsetoptimaltheisSiffAr
Ar
Sii
Sii
35
H/B-Optimal algorithm design
• n=6, m=4• Each line is ri (x)• Assume we know A*
• Optimal subset has the 4 courses
with largest ri (A*) values.
• Dilemma: A* is unknown
36
H/B-Optimal algorithm design
*
*
*
0)().3
0)().2
0)().1
AxiffxF
AxiffxF
AxiffxF
• Lemma 2:
• lemma 2 narrows
range of A*
(Xl , Xr) is the current
A*-range
37
H/B-Optimal algorithm design
• If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr)
• Compute the value of F((xl+xr)/2)
- if F((xl+xr)/2) > 0, then A* > (xl+xr)/2
- if F((xl+xr)/2) < 0, then A* < (xl+xr)/2
- if F((xl+xr)/2) = 0, then A* = (xl+xr)/2; (Lemma 2)
• Narrow down the range of A* by half
38
H/B-Optimal algorithm design
• Why keep on narrowing down the range of A* ?– If intersection of rj (x) and rk (x) falls out of range, then
the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes.
– If the range is narrow enough that there are no intersections of r (x) lines within the range then the total ordering of all r (A*) values is determined.
– Now our optimal problem is solved: just select the m candidates with highest r (A*) values.
• Main idea to solve this optimal problem.
39
H/B-Optimal algorithm design
• However, the total ordering requires O(n2) time complexity
• A randomized approach is used instead, this randomized algorithm:– Iteratively reduces the problem domain into a
smaller one.– The algorithm maintains 4 sets: X, Y, E, Z,
initially empty
40
H/B-Optimal algorithm designIn each iteration, randomly selects a course i, and compare it with each of the other courses, k. There are 4 possibilities:
1). if rk(A*) > ri(A*): insert k into set X
2). if rk(A*) < ri(A*): insert k into set Y
3). if wk=wi and vk=vi: insert k into set E4). if undetermined: insert k into set Z Now do the following loop:
loop:narrow the range of A* by half
compare ri(A*) with rk’(A*) for k’ in Zif appropriate, move k’ to X or Y, accordingly
until |Z| is sufficiently small (i.e., |Z| < |S|/32)
41
H/B-Optimal algorithm design
• The sets X or Y have enough members.
• Next, examine and compare the sizes of X, Y and E:
42
H/B-Optimal algorithm design
1). If |X|+|E| > m:
At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y|
43
H/B-Optimal algorithm design
2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then:
|S| = |S| - |X| + 1;
m = m - |X| + 1.
44
H/B-Optimal algorithm design• In either case, the resulting domain has reduced size.• By iteratively removing or collapsing courses, the
problem domain finally has only one course remaining: a course formed by collapsing all courses in optimal set.
• Complexity:Expected time complexity, briefly: (Assume Sb is the domain before iteration and Sa after.)1). Each iteration takes expected time O(|Sb|)2). Expected size |Sa| = (207/256) |Sb|
The recurrence relation of the iteration:T(n) = O(n) + T[(207/256)n]
Resolves to linear time complexity.
45
H/B-Greedy v.s. H/B-Optimal
• H/B-greedy is an approximation to H/B-Optimal
• H/B-greedy achieves higher H/B metric than
any existing algorithms.
• H/B greedy is more easy to implement than H/B-Optimal.
46
Simulation Results
• Evaluation of H/B Greedy PrefetchingFigure 1 : H/B , for total object number =1,000.Figure 2 : H/B , for total object number =10,000.Figure 3 : H/B , for total object number =100,000.Figure 4 : H/B , for total object number
=1,000,000.
• Evaluation of H-Greedy and B-Greedy algorithmFigure 5 : H-Greedy algorithm.Figure 6 : B-Greedy algorithm.Figure 7 : B-Greedy algorithm, zoomed in.
47
Figure 1: H/B, for total object number=1,000
48
Figure 2: H/B, for total object number=10,000
49
Figure 3: H/B, total object number=100,000
50
Figure 4: H/B, total object number=1,000,000
51
Figure 5: H-Greedy algorithm
52
Figure 6: B-Greedy algorithm
53
Figure 7: B-Greedy, Bandwidth magnified
54
Performance Comparison
Table 1. Performance comparison of different algorithms in terms of various metrics. (Lower values represents better performance)
55
Conclusions
• Proposed a family of Objective-Greedy prefetching algorithms, that are superior to Popularity, Good Fetch, APL, & Lifetime– Hit rate greedy (this is also optimal)– Bandwidth greedy (this is also optimal)– H/B greedy
• All the above are O(n) complexity• Proposed an H/B-Optimal algorithm, that is also
O(n) expected time• Experimental evaluation shows significant gains
over existing algorithms• H/B-greedy is almost as good as H/B-optimal
56