performance evaluation for implementations of a network of proxy caches

9

Click here to load reader

Upload: chetan-kumar

Post on 05-Sep-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance evaluation for implementations of a network of proxy caches

Decision Support Systems 46 (2009) 492–500

Contents lists available at ScienceDirect

Decision Support Systems

j ourna l homepage: www.e lsev ie r.com/ locate /dss

Performance evaluation for implementations of a network of proxy caches

Chetan Kumar ⁎Department of Information Systems and Operations Management, College of Business Administration, California State University San Marcos, 333 South Twin Oaks Valley Road, San Marcos,CA 92096, United States

⁎ Corresponding author. Tel.: +1 760 477 3976.E-mail address: [email protected].

0167-9236/$ – see front matter © 2008 Elsevier B.V. Aldoi:10.1016/j.dss.2008.09.002

a b s t r a c t

a r t i c l e i n f o

Article history:

In a network of proxy-level Received 9 January 2008Received in revised form 13 August 2008Accepted 4 September 2008Available online 14 September 2008

Keywords:CachingProxy cache networkCollaboration mechanismPerformance evaluation

caches, such as IRCache (www.ircache.net), nodes collaborate with one anotherto satisfy object requests. However, since collaboration in current implementations of proxy cache networksis typically limited to sharing cache contents, there may be unnecessary redundancies in storing objects. It isexpected that a mechanism that considers the objects cached at every node in the network would be moreeffective for reducing user delays. In this study we construct algorithms for different implementations of sucha mechanism using the theoretical approach of Tawarmalani et al. [Tawarmalani, M., Karthik, K., and De, P.,Allocating Objects in a Network of Caches: Centralized and Decentralized Analyses, (2007) Purdue UniversityWorking Paper] that investigate caching policies where nodes do consider objects held by their neighbors.The caching implementations are also compared and contrasted with numerical computations usingsimulated data. The performance results should provide useful directions for computer networkadministrators to develop proxy caching implementations that are suited for differing network and demandcharacteristics. There is a significant potential for deploying proxy cache networks in order to reduce thedelays experienced by web users due to increasing congestion on the Internet. Therefore we believe that thisstudy contributes to network caching research that is beneficial for Internet users.

© 2008 Elsevier B.V. All rights reserved.

1. Introduction and problem motivation

Caching involves storing copies of web objects in locations that arerelatively close to the end user. As a result user requests can be servedmore quickly than if they were served directly from the origin webserver [3,9,10,13]. Caching may be performed at different levels in acomputer network. These include the browser, proxy, and web-serverlevels [9,14]. This paper deals with proxy-level caching. Proxy cachingis widely utilized by computer network administrators, technologyproviders, and businesses to reduce user delays and to alleviateInternet congestion (www.web-caching.com). Some well knownexamples include proxy caching solution providers such as Microsoft(www.microsoft.com/isaserver) and Oracle (www.oracle.com/tech-nology/products/ias/web_cache/), Internet service providers (ISP)such as AT&T (www.att.com), and content delivery network (CDN)firms such as Akamai (www.akamai.com).

Effective proxy caching mechanisms are beneficial for reducingnetwork traffic, load on servers, and the average delays experiencedby web users [3,13,19]. Specifically we focus on a framework of anetwork of proxy caches. Unlike the case of a single proxy cachededicated to one network, proxy caches that are connected together asa network collaborate with each other to improve the overall cachingperformance in terms of user delays. If a particular proxy cache cannotsatisfy a request arriving at it then the object is searched for at its

l rights reserved.

neighbors in the network of caches. If none of its neighbors is able tosatisfy the object request either, then the request is satisfied directlyfrom the origin sever. Note that collaboration in this setup is limited toonly sharing the cache contents.

Fig. 1 illustrates how proxy caches may collaborate in a networkwith nodes at three locations. The demand forweb pages chrysler.com,ford.com, andmercedes-benz.comat theU.K. node, that has not cachedthem, is satisfied from the U.S. and Germany nodes. Therefore the U.K.node need not go to the origin server to satisfy requests for objects itdoes not hold itself, but are cached by its neighbors. As networkednodes collaborate and share cache contents, objects have to be fetchedfrom the origin server less frequently, thereby reducing user delays.However the degree of collaboration among the nodes can varydepending on the cache network under consideration. Proxy cachenetworks have been implemented both for public usage as well as inprivate organizations. Examples of public domain implementationsinclude IRCache (www.ircache.net) and DESIRE (www.uninett.no/arkiv/desire/). Private organizations that utilize proxy cache networksinclude CDN providers such as Akamai (www.akamai.com), and ISPssuch as AOL (www.aol.com). Network caching protocols, such as ICPand CARP, are supported by most well-known proxy servers includingSquid (note that IRCache utilizes Squid), Microsoft ISA Proxy Server,and Sun Java System Web Proxy Server (www.web-caching.com/proxy-caches.html). Proxy cache networks can significantly reduceuser delays. For example, at IRCache network objects that are cached atlocations close to the user can be served five times faster in fractions ofseconds than the alternative case. By reducing delays proxy caching

Page 2: Performance evaluation for implementations of a network of proxy caches

Fig. 1. A proxy cache network with three nodes (Source: www.ircache.net).

493C. Kumar / Decision Support Systems 46 (2009) 492–500

mechanisms benefit both the specific network where it is used as wellas all Internet users in general.

In a typical proxy cache network implementation, such as IRCache,each node in the networkmakes its own caching decisions based on therequest patterns it observes. Current network caching protocolsprimarily focus on collaboration by sharing cache contents, and thecaching decisions do not effectively take into account objects alreadyheld byneighboring nodes [17]. Hencemultiple copies of the sameobjectmaybeunnecessarily storedwithin the cachenetwork. It is expected thata mechanism that considers the objects cached at every node in thenetworkwould incur lessuser requestdelays than current cachenetworkimplementations. A few studies have investigated such caching policieswhere nodes do consider objects held by their neighbors under differentcoordination scenarios in the network [2,18]. The network coordinationscenarios include centralized anddecentralized frameworks. Anexampleof a centralized implementation is when a single firm that owns anumber of caches has control over the network caching decisions. In adecentralized framework the caches operate in a competitive environ-ment and do not coordinate their actions. This paper's primarycontribution is to use simulated data to perform numerical analyses ofthe caching policies investigated in Tawarmalani et al. [18]. We developalgorithms for implementing a network of caches under both centralizedand decentralized frameworks. The caching implementations are alsocompared and contrasted using numerical computations. The perfor-mance results should provide useful directions for computer networkadministrators to developproxycaching implementations that are suitedfor differing network and demand characteristics. There is a significantpotential for deploying proxy cache networks in order to reduce thedelays experienced by web users due to increasing congestion on theInternet [3]. Therefore we believe that this study contributes to networkcaching research that is beneficial for Internet users.

The plan of the rest of this paper is as follows. We first discussliterature related to our topic. We then review the theoretical devel-opments for implementations of cache networks. Next we performnumerical computations for implementing the proxy cache networksand comparing their performance. Finally we discuss conclusions andareas for future research.

2. Related literature

Caching has been extensively studied in computer science and othertechnical areas. In recent times there has been a growing interest in thetopic in Information Systems research. Datta et al. [3], and Mookherjee

and Tan [14], among others, have noted caching as an important researcharea because of its usefulness in reducing user delays due to increasingInternet congestion. Podlipnig and Boszormenyi [16], and Datta et al. [3],provide a comprehensive surveyof anumberof caching techniques. Theseinclude widely used cache replacement strategies such as least recentlyused (LRU), where the least recently requested object is evicted from thecache tomake space for a newone; least frequently used (LFU),where theleast frequently requested object is removed; and their many extensions.A majority of studies on caching focus on improving performance onmetrics such as user latency, which is the delay in serving user requests,and bandwidth reduction. There have been relatively few studies thatanalytically model the behavior of caches, and consider the economicimplications, while providing insights for managing them effectively.Mookherjee and Tan [14] provide a rigorous analytical framework for theLRU cache replacement policy. The framework is utilized to evaluate LRUpolicy performance under various demand and cache characteristics.Their study models caching at the browser level for individual caches.Hosanagar et al. [9] develop an incentive compatible pricing scheme forcaching with multiple levels of Quality of Service. Their model considersthe case of a monopolistic proxy caching service provider and multiplecontent publishers. Hadjiefthymiades et al. [8] model a game theoreticapproach for caching, but also for a case of a single proxy cache andmultiple users. The study develops a noncooperative game wherecompeting users are allocated proxy cache space such that monopolizingscenarios are avoided. Using this scheme a pure equilibrium is identifiedthat guarantees similar performance levels for all users. Park andFeigenbaum [15] design a bidding mechanism that provides incentivesfor users to truthfully reveal their willingness to pay for caching services.The study constructs a computationally tractable algorithm for imple-menting themechanism, though for the case of multiple users connectedtoa single cache. KumarandNorris [13]developamodel forproxycachingthat exploits historical patterns of user requests for caching decisions.Their mechanism, that is shown to perform favorably versus LRU policyusing a web trace dataset, is specific for individual proxy caches.

Though there has recently been a growing interest in the benefits ofproxy cache networks, literature in this area is relatively scarce. Mostexisting caching network protocols, such as ICP that is supported byIRCache, focus on collaboration by sharing cache contents [17]. Sinceindividual caches do not consider the objects held by their neighborswhile determining their own holding strategies, the current cachingnetworks can have unnecessary object replications. This independentcache behavior, referred as an “ad hoc scheme” by Ramaswamyand Liu[17], can lead to suboptimal performance for the network in terms of

Page 3: Performance evaluation for implementations of a network of proxy caches

494 C. Kumar / Decision Support Systems 46 (2009) 492–500

user delays. A few studies have considered the problem of determiningdecisions that are optimal for the proxy cache network. Chun et al. [2]consider optimal decisions, under both centralized coordination(optimal social welfare perspective) and decentralized game (selfishcache behavior perspective) scenarios, where caches have a distancemetric for accessing objects within the network. The studymodels thedecision problem without cache capacity restrictions. As a result, thecentralized coordination scenario is shown to reduce to the mini-sumfacility location problem, and a pure equilibrium is identified for thedecentralized game. Hosanagar and Tan [10] develop a model foroptimal replication of objects, in a centralized scenario, using a versionof the LRU policy. The study considers a framework of two cacheswhose capacities are partitioned into regions where the level ofduplication is controlled. Ercetin and Tassiulas [4] construct a marketbased mechanism for minimizing latency in a network where CDNproxy caches and content providers are behaving selfishly in anoncooperative game. In addition to characterizing the equilibrium ofthis game, the study also investigates the conditions for the existence ofa unique equilibrium. However the centralized coordination scenario isnot considered. This study is distinct from prior work as we considerproxy cache networks, with cache capacity restrictions, and numericallyevaluate the optimal performance under both centralized coordinationand decentralized game scenarios. The basic model for the twoapproaches is taken from the Tawarmalani et al. [18] study. Thecentralized and decentralized models are implemented using algo-rithms and their performance is compared with numerical computa-tions. This studyexpands onpreliminary computations of Kumar [12] bya comprehensive performance evaluation of the proposedmechanisms.

3. Model review

In this section we review the theoretical developments forimplementations of proxy cache networks from Tawarmalani et al.[18]. The models are reviewed in some detail here so readers have adescriptive illustration for the algorithms developed next in numericalcomputations Section 4. In addition we describe the current networkcaching method, and discuss motivations for other caching implemen-tations in contrast to that. Later in numerical computationswe constructsome illustrative numerical examples for caching approaches fordifferent problem sizes. In the following subsections we first describethe current implementation and characteristics of a proxy cachenetwork such as IRCache. Next we discuss two frameworks, centralizedand decentralized, under which network caching maybe implemented[2,18]. In the centralized framework all caching decisions are completelycoordinated based on overall network request patterns. Under adecentralized framework the caches interact with one another withoutthe presence of any controlling authority. The motivation for bothapproaches is further described in their respective subsections.

3.1. Proxy cache network structure

As a first step we consider the current implementation of a proxycache network such as IRCache. Note that in this implementation auser chooses a particular cache and necessarily goes through thatnode for accessing any object. Every cache in the network attempts tominimize the waiting time of satisfying its local demand for differentobjects. We model this behavior for a “snapshot” time periodassuming a known demand for every object at each proxy-cachelocation. In current network cache implementations, such as IRCache(www.ircache.net), nodes with cached objects are regularly measuredfor network proximity using echo requests. Some proxy cachingmechanisms use historical request patterns to predict object demand,though they may allow exceptions for dynamic content requests[3,20]. Analogous to these cases we assume object demand at nodesis known a priori as a first cut of the model. Subsequent versionsmay relax this assumption. Let the sets N={1,…,n} and M={1,…,m},

represent objects and caches, respectively. The aggregate demand forobject iaN at cache jaM for any “snapshot” time period is denoted byαij, and αij≥0, ∀iaN, ∀jaM. Let the caches jaM have fixed capacitiesdenoted by K={k1,…,km}, and let kj≤n, ∀kj aK. For simplicity, weassume that all objects are of unit size, and that there are nocommunication congestion delays between locations. We associate acost with the waiting time that any end user faces between requestingan object and actually receiving it. The waiting time costs can also beviewed as the cost of communication that is incurred while the cachesand end-users are interacting with one-another. Let cl, cn, and corepresent the unit waiting cost of satisfying an object request from thelocal cache, a neighbor cache, and directly from the origin web server,respectively. By definition, cl≤cn≤co, as origin server typically involveshighest waiting times, followed in decreasing order by neighbor andlocal cache. Every cache's objective is then tominimize the total cost ofwaiting. (Note that traditional measures for caching performance—forexample, maximization of hit-ratio, external bandwidth saving, etc.—all aim at reducing user-delays.) Of course this objective is constrainedby the fact that the cache has a limited capacity. This problem can beeasily formulated as a mathematical program for each cache locationshown briefly as follows: min {cl(objects served from local cache)+cn(objects served from neighbor cache)+co(objects served from originserver)}, s.t. a request is satisfied from any one of local or neighborcaches, or else the origin server and the local cache stores objectsrestricted by its capacity. An intuitive solution for the cost minimiza-tion objective would be for every cache j to rank the objects indecreasing order of demand αij. The cache is then filled to capacitywith the most demanded objects. The demand for remaining objects,which could not be accommodated within the cache, is then satisfiedby either a neighbor cache or the origin server. Of course the originserver route is used only if none of the neighbor caches has this object.

3.2. Centralized mechanism

As noted earlier, in the current implementation of cache networksbecause of the lack of coordination among caches there could beredundant copies of objects at multiple locations. We now considerthe casewhere there is a central administrator who coordinates all thecaching decisions based on complete information about the objectrequest patterns in the network. An example of this scenario would bea single firm that owns a number of caches across multiple locationsand has complete control over the network caching decisions. Thefirm's objective is to minimize the overall network waiting time bydeciding which objects should be stored at each of the caches. Thesolution to this problem provides the optimal social welfare outcomein terms of the network costs. The performance of the centralizedmechanism may then be used as a benchmark for comparison againstother mechanisms that do not assume complete coordination amongthe caches. The central administrator's decision process is formulatedas a 0-1 mathematical program model as follows [18].

LSð Þ min ∑m

j¼1cl−cnð Þ ∑

n

i¼1αijxij þ cn ∑

n

i¼1αij þ co−cnð Þ ∑

n

i¼1αijyi

� �ð1Þ

s:t: ∑n

i¼1xij V kj 8j 2 M ð2Þ

yi z1− ∑m

j¼1xij 8i 2 N ð3Þ

yi z0 8i 2 N ð4Þ

xij 2 0;1f g 8i 2 N; 8j 2 M ð5Þ

yi 2 0;1f g 8i 2 N; ð6Þ

where xij is 1 if object i is held in cache j and 0 otherwise, and yi is 1 ifobject i is procured from the origin server and 0 otherwise. In Problem

Page 4: Performance evaluation for implementations of a network of proxy caches

495C. Kumar / Decision Support Systems 46 (2009) 492–500

(LS) variable xij provides the optimal solution for the mathematicalprogrammodel, i.e., the locations atwhichobjects shouldbe cached. Theobjective function (1) captures the central planner's goal of minimizingnetwork costs. Constraints (2) provide the cache capacity restrictions.Constraints (3) and (4), alongwith thenon-negative objective coefficientof yi and minimization objective function (1), ensure that (a) any objectheld by a neighbor cache is only obtained when it is not present in thelocal cache, and (b) an object is obtained from the origin server onlywhen no cachewithin the network has that object. The optimal solutionto (LS) minimizes redundancy of objects cached in the network.Tawarmalani et al. [18] show that (LS) can be solved in polynomialtime since it reduces to the transportation problem [1,6]. As a result theoptimal social welfare caching decisions can be efficiently determinedby computer network administrators under the centralized scenario.The performance of the centralized mechanism may then be used as abenchmark for comparison against other mechanisms that do notassume complete coordination among the caches.

3.3. Decentralized mechanism

In the centralized mechanismwe assume that all the caches in thenetwork can be fully coordinated within a single firm. Instead ofrequiring complete coordination among the caches, we next let thecaches interact with one another in a decentralized manner. We nowconsider a decentralized mechanism where the caches operate in acompetitive environment and do not coordinate their actions. In thismechanism every cache behaves selfishly and individually tries tominimize its own costs based on network demand patterns. Note thatexisting implementations of proxy cache networks such as IRCache canalso be viewed as decentralized mechanisms. However at IRCache thecooperation between caches is primarily limited to sharing their cachecontents. In the decentralized mechanism the decisions made by theindividual caches are based not only on their own demands, but also onthe demands of their neighbors. Hence this approachmay significantlyreduce caching redundancies compared to current implementations.An example of such an arrangement could be when a number of firmsdecide to share their cache contents for reducing user delays, but theyare interested in getting the best possible delay-reduction perfor-mance at their own caches. The key difference here,with respect to thecentralized mechanism, is that every cache makes its own cachingdecisions without the presence of a controlling authority.

We use the network structure outlined in Section 3.1 to discuss thedecentralized caching framework. As before the network demandpatterns and cache capacities are assumed to be known a priori. Everycache j determines to hold the set of objects that minimizes its cost,given the holding strategies of other caches j′≠ j. The actions of cachesin a general setting are modeled using a game-theoritic approach byTawarmalani et al. [18]. Similar to earlier studies, we focus on the purestrategy Nash equilibrium solutions of this simultaneous move game[2,18]. This is the outcome where no cache benefits by changing itobject holding strategy while other nodes keep their strategiesunchanged [5]. In the “caching game” each node j solves the followingproblem, given the strategy of every other cache j′≠ j, j′aM [18].

GSð Þ minx1j ;:::;xnj

cl−cnð Þ ∑n

i¼1αijxij þ cn ∑

n

i¼1αij

þ co−cnð Þ ∑n

i¼1αij ∏

j′¼1j′≠j

m

1−xij′� �

0B@

1CA 1−xij� �

8><>:

9>=>;

ð7Þ

s:t: ∑n

i¼1xij V kj ð8Þ

xij; xij′; j′≠j 2 0;1f g 8i 2 N; ð9Þ

where x−ij′ is 1 if cache j′≠ j, j′aM holds object i and 0 otherwise.The optimal solution to (GS) provides the best response objectholding strategy for any cache jaM, given the strategies of neighborcaches j′≠ j captured by the non-linear expression 1−∏j′≠j 1−xij′

� �in

objective function (7). Constraint (8) provides the cache capacityrestrictions. Tawarmalani et al. [18] demonstrate that a pureequilibrium exists for the above caching game, and that there maybe multiple pure equilibria that can have different network costs.The authors also construct an integer programming formulationfor the caching game (GS), referred to as (BP), by using the Gloverand Woolsey method [7] and introducing variables to linearize1−∏j′≠j 1−xij′

� �. Solving (BP) with different objective functions allows

us to identify multiple pure equilibria that we are interested in forcomparing their performance in terms of network costs. A questionthat follows naturally is: what is the performance gap between theequilibrium that attains the minimum network costs (referred tohereafter as the best social equilibrium as it maximizes social welfarein the caching game) and that which has the maximum network cost(referred to hereafter as the worst social equilibrium)? In addition,how do both these equilibria solutions of the decentralized mechan-ism compare with the optimal social welfare outcome provided by thecentralized framework? In the following section, algorithms aredeveloped for implementing both the centralized and decentralizedframeworks, and numerical computations are performed to evaluatedifferences between the two mechanisms.

4. Numerical computations

We now compare the performance of the centralized and decen-tralized cachingmechanisms in terms of network costs. As mentioned inSection 3.2, the centralized mechanism provides the optimal socialwelfare outcome. The optimal social welfare solution is obtained bysolving Problem (LS) for a given set of parameters. As mentioned inSection 3.3, the decentralized mechanism can have multiple pureequilibria solutions. We specifically consider two pure equilibria out-comes: (a) best equilibrium for social welfare – this is the equilibriumwith the least network cost, and (b)worst equilibrium for socialwelfare –the equilibriumwith the highest network cost. The best and worst socialequilibrium solutions are obtained by solving Problem (BP), withminimization and maximization objective functions, respectively, for agiven set of parameters. We further discuss the motivation for con-sidering different equilibria outcomes later using numerical example 1.

The performance of the mechanisms are compared for thefollowing set of model parameters: n=15 objects, m=3 caches, kj=3objects, cl=1, cn=2, co=3, and ∑n

i¼1αij ¼ 600;8j 2 M. We are interestedin observingmodel performancewith varying object demandpatterns,while keeping other parameters constant. The demand patterns aregenerated such that the total demand for objects at each cache isalways 600, though the demand for individual objects at differentcache locationsmay vary (depending onwhether or not the caches aresymmetric). In addition the cache capacities are kept constant. In thisway we can simulate different aggregate patterns while ensuring thatcaches vary only in terms of distribution patterns and not in overallnetwork demand and size characteristics. The mechanisms weremodeled using GAMS version 21.7 and the corresponding mathema-tical programs were solved using CPLEX version 8.1.

Note that we do not include any cost of coordination among thecaches in the centralized network caching framework. This is becausethe coordination cost may be considered as a fixed cost, say F, formanaging the caching decisions of the entire proxy-cache network.The incurred cost can be regarded as a one-time fixed charge (i.e., F),or as being proportional to the number of caches m in the network(i.e., mF). Irrespective of the type of fixed cost including it in theminimization objective function does not influence the solution of themathematical programmodel. Thereforewe ignore coordination costsand use thewaiting time performance of the centralized framework as

Page 5: Performance evaluation for implementations of a network of proxy caches

Table 1Pure equilibria and optimal social welfare

Demand Pure equilibria Optimal social welfare

Objects held by cache 1, cache 2,and network costs

Objects held by cache 1, cache 2,and network costs

[50,20,10,5,3] {1,2},{1,2}, & 248; {1,2},{1,3} or {1,3},{1,2}, & 238{1,2},{1,3} or {1,3},{1,2}, & 238

[50,20,15,10,3] {1,2},{1,3} or {1,3},{1,2}, & 295 Identical to pure equilibria[50,45,40,35,30] {1,2},{3,4} or {3,4},{1,2}, & 690; Identical to pure equilibria

{1,3},{2,4} or {2,4},{1,3}, & 690;{1,4},{2,3} or {2,3},{1,4}, & 690

[80,70,30,5,3] {1,2},{1,2}, & 528 {1,2},{1,3} or {1,3},{1,2}, & 508

Fig. 2. Symmetric network demand variance — tan θ.

496 C. Kumar / Decision Support Systems 46 (2009) 492–500

a benchmark for comparison against the decentralized mechanisms. Ifwe had included a (sufficiently high) coordination cost with theoptimal solution, the centralized model may not always producebetter results than other mechanisms and could not be used as abenchmark. For consistency in comparison, we also do not include anyfixed cost of setting up a decentralized mechanism.

4.1. Symmetric network performance

We first consider performance of our network caching mechanismfor a network of symmetric caches (i.e., kj=k, ∀jaM and αij=αij′,j′≠ j=αi,∀iaN, ∀jaM). This symmetric setup can be used as a benchmark forcomparison of the benefits of network proxy caching under a generalsetting. Prior to discussing both caching implementations for largerproblem sizes, we first illustrate some characteristics of the cachinggame and centralized model with the following symmetric exampleusing some numerical values for costs and demand.

4.1.1. Example 1Let us consider a symmetric case (two caches j and j′ are said to be

symmetric when kj=kj′=k, and for each object i, αij=αij′=αi) with fiveobjects, two caches, and each cache has a capacity of two objects. Letcl=1, cn=2, and co=3. We now discuss the equilibrium characteristicsand optimal social welfare solutions for this caching game, shown inTable 1, with varying demand patterns. Here [α1,…α5] representsnetwork object demand and {i,i′i′ ≠ i} are the objects held by any cache.When the network object demand is [50,20,10,5,3] there is a pureequilibrium solution, where cache 1 holds objects {1,2} and cache 2holds objects {1,3} (or vice versa), which is identical to the optimalsocial welfare outcome with a cost of 238. This is the best socialequilibrium. In addition, there exists another pure equilibriumwhereboth caches hold objects {1,2}with a higher network cost of 248. This istheworst social equilibrium. Note that the number of objects cached inthe network can be different across the equilibria outcomes, as well asbetween some equilibria and social welfare solutions. Networkdemands of [50,20,15,10,3] and [50,45,40,35,30] yield two and sixpure equilibria, respectively, that are identical to their optimal socialwelfare solutions. In these cases the best andworst social equilibria arealso identical. When the demand is [80,70,30,5,3], there is a uniquepure equilibrium where both caches hold objects {1,2} at a networkcost of 528. This equilibrium indicates identical best and worst socialequilibrium. Note that this pure equilibrium differs from the optimalsocial welfare solution where one cache holds objects {1,2} and theother holds objects {1,3}, with a cost of 508. From these examples weobserve that the caching game has the following characteristics: (a) apure equilibrium solution exists though it may not be unique, (b) theremay be multiple equilibria solutions (e.g., best and worst socialequilibrium) that differ in terms of network costs and cachingdecisions, and (c) it may be that no equilibrium solution attains theoptimal centralized framework cost that is the least possible for thenetwork.

From the above examples we know that the caching game can havemultiple pure equilibria, including best and worst social equilibrium. It

can be argued that if the cache administrators had the opportunity tocommunicate prior to making caching decisions, best equilibrium mayemerge as a “focal point” as thenetwork costs are lower in that outcome.Theory of focal points suggest that in some real-life situations playersmay be able to coordinate on a particular equilibrium by usinginformation that is abstracted away by the strategic form [5]. Howeverin our case this necessitates some sort of prior communication amongthe cache operators. For example, in Table 1 when the demand is[50,20,15,10,3], then {1,2},{1,3} is the best social equilibrium with 238cost, that also happens to be optimal social welfare. Arguably cacheoperators may coordinate on that, rather than {1,2},{1,2} equilibriumwith higher cost of 248. However this assumes some communicationamong the cache operators prior to decisionmaking. In the present formof the caching game we do not consider such situations. In subsequentversionswe can consider how best social, or any other, equilibriummayemerge as a focal point in the cache network under realistic scenarios.Further illustrations and characteristics for both centralized anddecentralized caching implementations are discussed next for largerproblem sizes using algorithms.

Let the network demand for objects in the symmetric setting bedenoted by Di={α1,…,αi,…,αn}. We define the base case of thesymmetric network demand pattern to be Ci={c,…,c}, where allobjects have the same demand c (i.e., αi=c, ∀iaN). We are interestedin observing how model performance for both centralized anddecentralized mechanisms vary as the demand patterns deviatefrom Ci to any generated Di. Let θ be the angle formed betweendemand vectors Di and Ci in an n dimensional space (refer to Fig. 2).The measure for demand variance in symmetric network setting isdefined as tan θ [11]. Note that as Di diverges from Ci, there is acorresponding increase in tan θ as the orthogonal distance h betweenthe demand vectors increases as well.

The performance of the centralized and decentralizedmechanismsunder the symmetric setting can nowbe compared using the followingalgorithm.

Algorithm Symmetric Network PerformanceInput: n=15, m=3, kj=3, cl=1, cn=2, co=3, and ∑n

i¼1αi ¼ 600.Output: xij, ∀iaN, ∀jaM, Performance of Symmetric Network.Steps:

1. Let c ¼ ∑ni¼1αi=n; Ci ¼ c; z ¼ 15

2. Let di be uniformly distributed between (−z, +z)3. Scale di to di′ such that ∑n

i¼1d′i ¼ 0

4. wmin ¼ −c=max d′i� �

; wmax ¼ c=max −d′i� �

5. w=wmin

6. while wbwmax

7. Di=Ci+wdi′8. tan θ ¼ w=c

ffiffiffin

p9. Solve Problem (LS) and Problem (BP), using Di and the other

parameter values, for obtaining centralized and decentralizedmechanism performances, respectively

10. increment w11. end while12. Repeat Steps 2 through 11 for z=20 and 25.

Page 6: Performance evaluation for implementations of a network of proxy caches

497C. Kumar / Decision Support Systems 46 (2009) 492–500

The objective function values of the optimal solutions of (LS) and(BP) provide the mechanisms' performance in terms of network costs.The mechanism performance and tan θ were recorded for differentvalues of demand deviation vector di. Fig. 3 plots network cost versustan θ for three cases (a), (b), and (c) where di varies between (−15,+15), (−20, +20), and (−25, +25), respectively. As can be observed in allthree cases the network performance of the centralized social welfaremechanism monotonically increases with higher absolute values of

Fig. 3. Symmetric Network Performance. (a) Symmetric demand deviation di range =(−15deviation di range =(−25,+25).

tan θ (note that lower network costs translate to better networkperformance). This is because as |tan θ| increases, and there is greatervariance in the object demands, the caches benefit to a greater degreeby holding and sharing different objects. Since greater demandvariance leads to better network performance it can be expectedthat with increasing range of di the centralized mechanism perfor-mance should improve. This can be verified by observing that theslope of the centralizedmechanism line progressively increases across

,+15). (b) Symmetric demand deviation di range =(−20,+20). (c) Symmetric demand

Page 7: Performance evaluation for implementations of a network of proxy caches

498 C. Kumar / Decision Support Systems 46 (2009) 492–500

cases (a), (b), and (c) in Fig. 3. Since the centralized mechanismidentifies solutions that result in minimum network costs itsperformance is always better than or equal to that of the decentralizedmechanisms. The decentralizedmechanism performance for both bestand worst equilibrium also generally tends to improvewith increasing|tan θ| for all three cases. However a key difference is that in thedecentralized mechanisms there can be deviations at some pointswhere higher |tan θ| can actually lead to worse performance. Anexample of this is in case (a) of Fig. 3 where the best equilibriumnetwork cost increases from 3544 to 3567 when tan θ increases from0.0121 to 0.0128. In Fig. 3 case (b) worst equilibrium cost increasesfrom 3611 to 3648 when tan θ increases from 0.0076 to 0.0081.Another example is case (c) of Fig. 3 with tan θ increase from 0.007 to0.0075 best equilibrium cost increases from 3552 to 3578. Thesedeviations occur because in the decentralized caching game the nodesbehave selfishly. Therefore in some cases the best response moves bythe caches result in pure equilibria outcomes where some of nodes arebetter off than others at the expense of overall network performance.Since individual nodes act selfishly the network costs for thedecentralized mechanism may increase with increasing variance.Other than the deviations mentioned above, the mechanismsgenerally tend to perform better with increasing demand variancein the symmetric network.

4.2. Asymmetric network performance

We now consider the performance of our network cachingmechanisms in a general setting, referred to as an asymmetricnetwork, where the caches j can have different object demands andcapacities. This scenario is similar to a real world setting whereproxy nodes in the network may have differing demand patterns.Let Aj=(α1j,…,αnj) denote the aggregate demand faced by eachcache j, and K= {k1,…,km} denote the cache capacities. Forcomparison purposes, Aj, ∀jaM sin the asymmetric setting isgenerated such that the demands diverge from corresponding Di inthe symmetric setting. As before the total object demand for everycache is set as 600. These network demand characteristics aregenerated by using the transportation problem framework, whichis a classical combinatorial optimization problem, described asfollows [1,6]. Find the minimum cost to fulfill the product demandof n demand locations using available capacities of m supplylocations, given the variable costs of transporting product unitsfrom supply to demand locations. We utilize this framework bysetting constraints for object demand as ∑n

i¼1αij ¼ 600; 8j 2 M and∑m

j¼1αij ¼ Di; 8i 2 N. In addition, on supply side kj=3, ∀jaM toensure cache sizes are same across both settings. The relative costsof satisfying object requests from neighbor caches or origin serverare also identical for the two settings. Using this we ensure thatasymmetric and symmetric network patterns have a basis forcomparison. Let the total sample variance, which is an instance ofgeneralized variance, be the measure for demand variance in theasymmetric network setting [11]. The total sample variance, var, isdefined as the sum of the lengths of the deviation between Aj, ∀jaMand mean demand vector Ci. The performance of centralized anddecentralized mechanisms under the asymmetric setting can now becompared using the following algorithm.

Algorithm Asymmetric Network PerformanceInput: n=15, m=3, kj=3, cl=1, cn=2, co=3, ∑n

i¼1αij ¼ 600; 8j 2 M,and Di, ∀iaN.

Output: xij, ∀iaN, ∀jaM, Performance of Asymmetric Network.Steps:

1. Let c ¼ ∑ni¼1αij=n; Ci ¼ c

2. Generate Aj, ∀jaM using the transportation problem frameworksuch that ∑n

i¼1αij ¼ 600; 8j 2 M and ∑mj¼1αij ¼ Di; 8i 2 N

3. vi ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi∑m

j¼1 αij−Ci� �2q

4. var ¼ ∑ni¼1vi

5. Solve Problem (LS) and Problem (BP) using Aj, ∀jaM and the otherparameter values, for obtaining centralized and decentralizedmechanism performances in asymmetric network setting,respectively.

6. Repeat Steps 2 through 5 for different values of Aj, ∀jaM thatdiverge from specific symmetric Di values.

The network costs of the optimal solutions of (LS) and (BP), andcorresponding var, were recorded for different values of Aj, ∀jaM formeasuring the mechanisms' performance under the asymmetricsetting. Fig. 4 plots network costs versus var for three asymmetriccases (a), (b), and (c) that correspond to symmetric Di with θ values24.08, 28.91, and 31.31, respectively. As can be observed in all threecases, there is a general trend of improved performance withincreasing var for the centralized mechanism. However a keydifference in the asymmetric setting is that the centralized mechan-ism performance does not increase monotonically with var. This isbecause unlike tan θ in the symmetric setting var is an approximatemeasure of demand variance. The trend of better performance withincreasing var is less pronounced for the decentralized mechanisms,and there are more deviations, because the caches behave selfishly. Asbefore the centralized mechanism performs no worse than thedecentralized mechanisms. However, compared to symmetric setting,there is a greater divergence between the performance of thedecentralized worst equilibrium mechanism and that of the othertwo. Examples can be observed in case (a) and case (c) of Fig. 4 wherethe worst equilibrium almost never achieves the costs of the othermechanisms. This occurs due to the greater variance in objectdemands in the asymmetric setting that leads to increased benefitsof sharing objects in the network. Since the centralized and bestequilibrium mechanisms allow caches to share objects more effec-tively they perform much better than the worst equilibrium. It is alsofor the same reason that centralized outperforms the best equilibriummechanism to a greater extent here. Therefore we can conclude thatdue to the increased demand variance in the asymmetric setting, themechanisms benefit to a greater degree by sharing objects. Thisimplies that in practical scenarios, where theremay be differing objectdemand patterns at nodes, there are increased benefits in deployingproxy cache networks. Note that we use integer programming model(BP) for the decentralized mechanism in both Algorithm SymmetricNetwork and Algorithm Asymmetric Network. Problem (BP) is amongthe class of NP-Hard problems [6]. This means that the optimalsolutions to the models cannot be solved efficiently for very largeproblem sizes. However in cases where problem size is very large,alternative solution approaches, that aim to find good results quickly,can be developed. Examples include heuristic procedures anddynamic programming [1,6].

An effective proxy caching mechanism is beneficial for allInternet users, due to reduced network traffic, load on web servers,and user delays [3,13]. The difference can be immediately apparentto an end user where a cached website may seem to loadinstantaneously compared to several seconds delay in the alternativecase. In addition, Internet companies can conserve investment inserver farms around the world for replicating web content toimprove load speeds (www.web-caching.com). A network of proxycaches can further significantly reduce user delays as illustrated byIRCache (www.ircache.net). If a requested object is cached at a nodein the network close to the user, then the waiting time in fractions ofseconds can be 5 times less than the alternative scenario. At theaggregate level the reduction in waiting times for all user requests ina network of proxy caches can be quite significant. We have shownwith our numerical computations, under different demand andnetwork characteristics, that the performance of proxy cachingnetworks can be improved when nodes also consider objects held bytheir neighbors. Given our results, we believe that there is significant

Page 8: Performance evaluation for implementations of a network of proxy caches

Fig. 4. Asymmetric Network Performance. (a) Asymmetric demand diverging from symmetric θ=24.08. (b) Asymmetric demand diverging from symmetric θ=28.91. (c) Asymmetricdemand diverging from symmetric θ=31.31.

499C. Kumar / Decision Support Systems 46 (2009) 492–500

potential for deploying proxy cache networks in order to reduce thedelays experienced by web users due to congestion on the Internet.

5. Conclusions

Proxy caching is widely used by computer network administrators,businesses, and technology providers to reduce user delays on theincreasingly congested Internet (www.web-caching.com). Effective

proxy caching mechanisms are useful for reducing network traffic,load on servers, and the average delays experienced by web users[3,13,19]. Specifically we focus on a framework of a network of proxycaches. In a typical proxy cache network implementation, such asIRCache (www.ircache.net), each node in the network makes its owncaching decisions based on the request patterns it observes. Currentnetwork caching protocols primarily focus on collaboration by sharingcache contents, and the caching decisions do not effectively take into

Page 9: Performance evaluation for implementations of a network of proxy caches

500 C. Kumar / Decision Support Systems 46 (2009) 492–500

account objects already held by neighboring nodes [17]. A few studieshave investigated caching policies where nodes do consider objectsheld by their neighbors under different coordination scenarios in thenetwork [2,18]. The network coordination scenarios include centra-lized and decentralized frameworks. An example of a centralizedimplementation is when a single firm that owns a number of cacheshas control over the network caching decisions. In a decentralizedframework the caches operate in a competitive environment and donot coordinate their actions. This paper's primary contribution is touse simulated data to perform numerical analyses of the cachingpolicies investigated in Tawarmalani et al. [18].We develop algorithmsfor implementing a network of caches under both centralized anddecentralized frameworks. The caching implementations are alsocompared and contrasted using numerical computations. The resultsdemonstrate that the performance of proxy caching networks can beimproved when nodes also consider objects held by their neighbors.We show that the centralized mechanism always performs no worsethan decentralized mechanisms, and it can serve as a benchmark forother caching approaches. We also demonstrate that the mechanisms'improve performancewith greater object demand variance among theproxy caches. This implies that in practical scenarios, where there maybe differing object demand patterns at nodes, there are increasedbenefits in deploying proxy cache networks. The performance resultsshould provide useful directions for computer network administratorsto develop proxy caching implementations that are suited for differingnetwork and demand characteristics. There are a number of interest-ing areas for future research. Thus far we have assumed that demandpatterns are known a priori in our mechanisms. An area of futureresearch could be to relax this assumption and develop models wheredemand patterns are not known, including cases involving requestsfor dynamic content. It would be also be interesting to determinewhatcould be realistic scenarios for the best social equilibrium to emerge asa focal point among multiple equilibria for the caching game. Anotherresearch area could be to compare the performance of our mechan-isms against that of existing network cache implementations such asIRCache, as well as caching policies such as LRU, using actual proxytrace datasets. Finally, we can also investigate alternative solutionapproaches for our caching models, such as dynamic programmingand heuristic procedures, which aim to find good solutions quickly[1,6].

To the best of our knowledge our research is the first to evaluatethe performance of implementations of capacitated proxy cachenetworks. There is a significant potential for deploying proxy cachenetworks in order to reduce the delays experienced by web users dueto congestion on the Internet. Therefore we believe that this studycontributes to network caching research that is beneficial for Internetusers.

Acknowledgements

We thank Mohit Tawarmalani, Prabuddha De, Karthik Kannan,seminar participants of Purdue University, and the 2004 International

Conference on Information Systems (ICIS) Doctoral Consortium forvaluable comments and contributions on this study.

References

[1] R.K. Ahuja, T.L. Magnanti, J.B. Orlin, Network flows, Theory, Algorithms, andApplications, Prentice Hall, NJ, Englewoods Cliffs, 1993.

[2] B.G. Chun, H. Chauduri, H. Wee, M. Barreno, C.H. Papadimitrou, J. Kubiatowicz,Selfish caching in distributed systems: a game-theoritic analysis, Proceedings ofthe Twenty-Third Annual ACMSymposiumon Principles of Distributed Computing,2004, pp. 21–30.

[3] A. Datta, K. Dutta, H. Thomas, D. VanderMeer, World wide wait: a study of internetscalability and cache-based approaches to alleviate it, Management Science 49(10) (2003) 1425–1444.

[4] O. Ercetin, L. Tassiulas, Market-based resource allocation for content delivery in theInternet, IEEE Transactions on Computers 52 (12) (2003) 1573–1585.

[5] D. Fudenberg, J. Tirole, Game Theory, MIT Press, Boston, 1991.[6] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, New York,

1979.[7] F. Glover, E. Woolsey, Converting a 0–1 polynomial programming problem to a 0–1

linear program, Operations Research 22 (1974) 180–182.[8] S. Hadjiefthymiades, Y. Georgiadis, L. Merakos, A game theoritic approach to web

caching, NETWORKING 2004: Proceedings of the Third International IFIP-TC6Networking Conference, 2004.

[9] K. Hosanagar, Y. Tan, Optimal duplication in cooperative web caching, Proceedingsof the 13th Workshop on Information Technology and Systems (WITS), 2004.

[10] K. Hosanagar, R. Krishnan, J. Chuang, V. Choudhary, Pricing and resource allocationin caching serviceswithmultiple levels of QoS,Management Science 51 (12) (2005)1844–1859.

[11] R.A. Johnson, D.W.Wichern, AppliedMultivariate Statistical Analysis, Prentice Hall,Englewood Cliffs, NJ, 2002.

[12] C. Kumar, Implementation and Evaluation of Proxy Cache Networks, CaliforniaState University, San Marcos, 2007 Working Paper.

[13] C. Kumar, J.B. Norris, A new approach for a proxy-level web caching mechanism,Decision Support Systems (2008). doi:10.1016/j.dss.2008.05.001.

[14] V.S. Mookherjee, Y. Tan, Analysis of a least recently used cache management policyfor web browsers, Operations Research 50 (2) (2002) 345–357.

[15] C. Park, J. Feigenbaum, Incentive Compatible Web Caching, Yale University, 2001Working Paper.

[16] S. Podlipnig, L. Boszormenyi, A survey of web cache replacement strategies, ACMComputing Surveys 35 (4) (2003) 374–398.

[17] L. Ramaswamy, L. Liu, An expiration age-based document placement scheme forcooperative web caching, IEEE Transactions on Knowledge and Data Engineering16 (2004) 585–600.

[18] M. Tawarmalani, K. Karthik, P. De, Allocating Objects in a Network of Caches:Centralized and Decentralized Analyses, Purdue University, 2007 Working Paper.

[19] E.F. Watson, Y. Shi, Y. Chen, A user-access model-driven approach to proxy cacheperformance analysis, Decision Support Systems 25 (1999) 309–338.

[20] D. Zeng, F. Wang, M. Liu, Efficient web content delivery using proxy cachingtechniques, IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applica-tions And Reviews 34 (3) (2004) 270–280.

Chetan Kumar is an Assistant Professor in the Department of Information Systems andOperations Management at the College of Business Administration, California StateUniversity San Marcos. He received his PhD from the Krannert School of Management,Purdue University. His research interests include pricing and optimization mechanismsfor managing computer networks, caching mechanisms, peer-to-peer networks,ecommerce mechanisms, web analytics, and IS strategy for firms. He has presentedhis research at INFORMS, WEB, WISE, ICIS Doctoral Consortium, and AMCIS DoctoralConsortium conferences. His research has been published in DSS journal, and he hasserved as a reviewer for EJOR, JMIS, DSS, and JECR journals.