1 simulation evaluation of web caching architectures carey williamson mudashiru busari department of...

48
1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

Post on 21-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

1

Simulation Evaluation of Web Caching Architectures

Carey WilliamsonMudashiru BusariDepartment of Computer ScienceUniversity of Saskatchewan

Page 2: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

2

Outline

Introduction: Web CachingProxy Workload Generator

(ProWGen)Evaluation of Single-Level CachesEvaluation of Multi-Level CachesConclusions and Future WorkQuestions?

Page 3: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

3

Introduction

“The Web is both a blessing and a curse…”

Blessing: Internet available to the masses Seamless exchange of information

Curse: Internet available to the masses Stress on networks, protocols, servers, users

Motivation: techniques to improve the performance and scalability of the Web

Page 4: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

4

Why is the Web so slow?

Three main possible reasons:Client-side bottlenecks (PC, modem)

Solution: better access technologies (TRLabs)Server-side bottlenecks (busy Web site)

Solution: faster, scalable server designsNetwork bottlenecks (Internet congestion)

Solutions: caching, replication; improved protocols for client-server communication

Page 5: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

5

What is a Web proxy cache?

Intermediary between Web clients (browsers) and Web servers

Controlled Internet access point for an institution or organization (e.g., firewall)

Natural point for Web document cachingStore local copies of popular documentsForward requests to servers only if

needed

Page 6: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

Web Caching Proxy

C C C CWebClients

Proxy

WebServer

WebServer

Internet

Region orOrganizationBoundary

Page 7: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

7

Some Technical Issues

Size of cacheReplacement policy when cache is fullCache coherence (Get-If-Modified)Some content is uncacheableMulti-cache coordination, peering (ICP)Security and privacy; “hit metering”Other issues...

Page 8: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

8

Our Previous Work

Collaborative project with CANARIE, through the Advanced Networks Applications program (July’98-June’99)

Design and evaluation of Web caching strategies for Canada’s CA*net II backbone (National Web Caching Infrastructure)

For more information, see URL http://www.cs.usask.ca/faculty/carey/projects/nwci.html

Page 9: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

CA*net II Web Caching Hierarchy (Dec 1998)

Page 10: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

CA*net II Web Caching Hierarchy (Dec 1998)

USask

CANARIE(Ottawa)

(selectedmeasurement points for our traffic analyses; 3-6 months of data from each)

To NLANR

Page 11: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

Caching Hierarchy Overview

C C C CC C C

Proxy

Proxy

Proxy Proxy

Proxy

Proxy

... ...

Regional/Univ.(5-10 GB)

National(10-20 GB)

Top-Level/International(20-50 GB)

Cache Hit Ratios

30-40%

15-20%

5-10%(empiricallyobserved)

Page 12: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

12

NWCI Project Contributions

Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000)

Developed Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching hierarchies

Recommendations for CANARIE NWCI about configuration of future caches

Page 13: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

13

Overview of This Talk

Constructed synthetic Web proxy workload generation tool (ProWGen) that captures the salient characteristics of empirical Web proxy workloads

Use ProWGen to evaluate sensitivity of proxy caches to workload characteristics

Use ProWGen to evaluate effectiveness of multi-level Web caching hierarchies (and cache management techniques)

Page 14: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

14

Research MethodologyDesign, construction, and

parameterization of workload modelsValidation of ProWGen (statistically,

and versus empirical workloads)Simulation evaluation of single cache

Sensitivity to workload characteristics Different cache sizes, replacement policies

Simulation evaluation of multi-level cache Sensitivity to workload characteristics Novel (heterogeneous) cache management policies

Page 15: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

15

Key Workload Characteristics

“One-timers” (60-70% useless!!!)Zipf-like document referencing popularityHeavy-tailed file size distribution (i.e.,

most files small, but most bytes are in big files)

Correlations (if any) between document size and document popularity (debate!)

Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. 2000]

Page 16: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

16

ProWGen Conceptual View

ProWGen Software

1 Z a c L

InputParameters

SyntheticWorkload

Page 17: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

17

ProWGen Conceptual View

ProWGen Software

1 Z a c L

P

r

Zipf

InputParameters

SyntheticWorkload

Page 18: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

18

ProWGen Conceptual View

ProWGen Software

1 Z a c L

P

r

Zipf

InputParameters

SyntheticWorkload

Page 19: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

19

ProWGen Conceptual View

ProWGen Software

1 Z a c L

P

r

ZipfF

s

LLCD

-1 0 +1Correlation

InputParameters

SyntheticWorkload

Page 20: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

20

ProWGen: Workload Modeling Details

Modeled workload characteristics One-time referencing Zipf-like referencing behaviour (Zipf’s Law) File size distribution

Body – lognormal distribution Tail – Pareto Distribution

Correlation between file size and popularity Temporal locality

Static probabilities in finite-size LRU stack model Dynamic probabilities in finite-size LRU stack

model

Page 21: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

21

Validation of ProWGen

To establish that the synthetic workloads possess the desired characteristics (quantitative and qualitative), and that the characteristics are similar to those in empirical workloads

Example: analyze 5 million requests from a proxy server trace and parameterize ProWGen to generate a similar workload

Page 22: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

22

Parameter Value

Total number of requestsUnique documents (of total requests)One-timers (of unique documents)Zipf slopeTail IndexDocuments in the tail Beginning of the tail (bytes)Mean of the lognormal file size distributionStandard deviation Correlation between file size and popularityLRU Stack Model for temporal localityLRU Stack Size

5,000,00034%72%0.8071.32222%10,0007,00011,000ZeroStatic and Dynamic1,000

Workload Synthesis

Page 23: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

23

Zipf-like Referencing Behaviour

Empirical Trace Slope = 0.81

Synthetic Trace Slope = 0.83

Page 24: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

24

Transfer Size Distribution

References

Bytes transferred

Page 25: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

25

Research Questions:Single-Level Caches

In a single-level proxy cache, how sensitive is Web proxy caching performance to certain workload characteristics (one-timers, Zipf-ness, heavy-tail index)?

How does the degree of sensitivity change depending on the cache replacement policy?

Page 26: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

26

Web Clients

Web Servers

Proxy server

Aggregate Workload

Simulation Model

Page 27: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

27

Factors and LevelsCache sizeCache Replacement Policy

Recency-based LRU Frequency-based LFU-Aging Size-based GD-Size

Workload Characteristics One-timers, Zipf slope, tail index,

correlation, temporal locality model

Page 28: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

28

Performance Metrics

Cache hit ratio Percent of requested docs found in

cache (HR) Percent of requested bytes found in

cache (BHR)User response time

Estimated analytically using request rates, cache hit ratios, and (relative) cache miss penalties

Page 29: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

29

Simulation Results (Preview)

Cache performance is very sensitive to: Slope of Zipf-like doc referencing popularity Temporal locality property Correlations between size and popularity

Cache performance relatively insensitive to: Tail index of heavy-tailed file size distribution One-timers

Page 30: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

30

Sensitivity to One-timers (LRU)

01020304050607080

1 100 10000Cache Size (MB)

Hit

Rat

io

70% One-timers

60% One-timers

01020304050607080

1 100 10000Cache Size (MB)

Byt

e H

it R

atio

70% One-timers

60% One-timers

(a) Hit Ratio (a) Byte Hit Ratio

Page 31: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

31

Sensitivity to Zipf Slope (LRU)

01020304050607080

1 100 10000Cache Size (MB)

Hit R

atio

Slope = 0.95Slope = 0.75Slope = 0.60

01020304050607080

1 100 10000

Cache Size (MB)By

te H

it Ra

tio

Slope = 0.95Slope = 0.75Slope = 0.60

(a) Hit Ratio (b) Byte Hit RatioDifference of 0.2 in Zipf slope impacts performance

by as much as 10-15% in hit ratio and byte hit ratio

Page 32: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

32

Sensitivity to Heavy Tail Index (LRU Replacement Policy)

01020304050607080

1 100 10000Cache Size (MB)

Hit

Ra

tio

Tail Index = 1.4Tail Index = 1.3Tail Index = 1.2

01020304050607080

1 100 10000Cache Size (MB)

Byt

e H

it R

atio

Tail Index = 1.4Tail Index = 1.3Tail Index = 1.2

(a) Hit Ratio (b) Byte Hit Ratio

Page 33: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

33

01020304050607080

1 100 10000Cache Size (MB)

Hit

Rat

io

Tail Index = 1.4Tail Index = 1.3Tail Index = 1.2

01020304050607080

1 100 10000Cache Size (MB)

Byt

e H

it R

atio

Tail Index = 1.4Tail Index = 1.3Tail Index = 1.2

Sensitivity to Heavy Tail Index (GD-Size Replacement Policy)

(a) Hit Ratio (a) Byte Hit RatioDifference of 0.2 in heavy tail index impacts performance

by less than 3%

Page 34: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

34

Sensitivity to Correlation (LRU)

01020304050607080

1 100 10000Cache Size (MB)

Hit

Rat

io

Negative CorrelationZero CorrelationPositive Correlation

0102030405060708090

100

1 100 10000Cache Size (MB)

Byt

e H

it R

atio

Positive CorrelationZero CorrelationNegative Correlation

(a) Hit Ratio (a) Byte Hit Ratio

Page 35: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

35

0

10

20

30

40

50

60

70

1 100 10000

Cache Size (MB)

Byt

e H

it R

atio

Static ModelDynamic ModelEmpirical Trace

0

10

20

30

40

50

60

70

1 100 10000

Cache Size (MB)

Hit R

atio

Static ModelDynamic ModelEmpirical Trace

(a) Hit Ratio (b) Byte Hit Ratio

Sensitivity to Temporal Locality (LRU)

Page 36: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

36

Summary: Single-Level Caches

Cache performance is sensitive to: Slope of Zipf-like document referencing

popularity Temporal locality Correlation between size and popularity

Cache Performance is insensitive to: Tail index of heavy-tailed file size distribution One-timers

Page 37: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

37

Multi-Level Caching...

Workload characteristics change as you move up the Web caching hierarchy (due to filtering effects, aggregation, etc)

Idea #1: Try different cache replacement policies at different levels of hierarchy

Idea #2: Limit replication of cache content in overall hierarchy through “partitioning” (size, type, sharing,…)

Page 38: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

38

Research Questions:Multi-Level Caches

In a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy?

In a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy?

Page 39: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

39

Simulation Model

Proxy server

Web Servers

Web Clients

Proxy server

Proxy server

Upper Level (Parent)

Complete Overlap

No Overlap

Partial Overlap (50%) Lower Level

(Children)

Page 40: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

40

010203040506070

1 100 10000Cache Size (MB)

Byte

Hit

Ratio

Children (LRU) Parent (GD-Size)

Parent (LFU-Aging) Parent (LRU)

01020304050607080

1 100 10000Cache Size (MB)

Hit R

atio

Children (LRU) Parent (GD-Size)

Parent (LFU-Aging) Parent (LRU)

Experiment 1: Different Policies at Different Levels of the hierarchy

(a) Hit Ratio (b) Byte Hit Ratio

Parent Parent

Children

Children

Page 41: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

41

Experiment 2: Shared files at the upper level of the hierarchy

(c) No Overlap

0102030405060708090

1 100 10000Cache Size (MB)

Hit

Ra

tio

Children (LRU)

Parent (LRU)

Children

Parent

(a) Complete Overlap

0

10

20

30

40

50

60

70

80

1 100 10000

Cache Size (MB)

Hit

Ra

tio

Parent (LRU)

Children (LRU)

Children

Parent

(b) Partial Overlap

0

10

20

30

40

50

60

70

80

1 100 10000Cache Size (MB)

Hit

Ra

tio

Parent (LRU)

Children (LRU)

Children

Parent

Page 42: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

42

Experiment 3: Size-based PartitioningPartition files across the two levels

based on sizes (e.g., keep small files at the lower level and large files at the upper level) (or vice versa)

Three size thresholds 5,000 bytes 10,000 bytes 100,000 bytes

Page 43: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

43

010203040506070

1 100 10000Cache Size (MB)

Hit

Ra

tio

Children (LRU) Parent (GD-Size)

Parent (LRU) Parent (LFU-Aging)

010203040506070

1 100 10000Cache Size (MB)

Byt

e H

it R

ati

o

Parent (LRU) Parent (LFU-Aging)

Parent (GD-Size) Children (LRU)

Size threshold = 5,000 bytes

010203040506070

1 100 10000Cache Size (MB)

Hit

Ra

tio

Children (LRU) Parent (GD-Size)

Parent (LRU) Parent (LFU-Aging)

010203040506070

1 100 10000Cache Size (MB)

Byt

e H

it R

ati

o

Parent (LRU) Parent (LFU-Aging)

Parent (GD-Size) Children (LRU)

Size threshold = 10,000 bytes

Small files at the lower level; Large files at the upper level

Parent

Children

Page 44: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

44Size threshold = 5,000 bytes

010

2030

4050

6070

1 100 10000Cache Size (MB)

Hit

Ra

tio

Parent (GD-Size) Parent (LRU)

Parent (LFU-Aging) Children (LRU)

010203040506070

1 100 10000Cache Size (MB)

Byt

e H

it R

ati

o

Children (LRU) Parent (LRU)

Parent (LFU-Aging) Parent (GD-Size)

Size threshold = 10,000 bytes

0

1020

3040

5060

70

1 100 10000Cache Size (MB)

Hit

Ra

tio

Parent (GD-Size) Parent (LRU)

Parent (LFU-Aging) Children (LRU)

010203040506070

1 100 10000Cache Size (MB)

Byt

e H

it R

ati

o

Parent (LRU) Parent (LFU-Aging)

Parent (GD-Size) Children (LRU)

ChildrenParent

Large files at the lower level; Small files at the upper level

Page 45: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

45

Summary: Multi-Level CachesDifferent Policies at different levels

LRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance

GD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio

Sharing-based approach no benefit compared to the other cases studied

Size-threshold approach small files at the lower level + large files at the

upper level provided improvement in performance reversing this policy offered no perf advantage

Page 46: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

46

Conclusions

ProWGen is a valuable tool for the evaluation of Web proxy caching architectures, using synthetic workloads

Existing multi-level caching hierarchies are not always that effective

“Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance

Page 47: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

47

Future Work

Extend the multi-level experiments look into configurations where there is

communication between the lower level proxies

investigate configurations involving more levels and and more lower level proxies

Extend ProWGen model response time model file size modifications

Page 48: 1 Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan

48

For More Information...

M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, June 2000

Two papers available soon (under review)ProWGen tool is available now

Email: [email protected]://www.cs.usask.ca/faculty/carey/