analyzing the performance of cloudflare’s anycast...

9
Analyzing the Performance of Cloudflare’s Anycast CDN, a Case Study Emiel Bos University of Twente P.O. Box 217, 7500AE Enschede The Netherlands [email protected] ABSTRACT Anycast is a network routing mechanism which is sup- posed to route traffic to the topologically nearest server out of a group of potential servers announcing the same IP address, and is widely used for its DDoS mitigation and increase of service availability and network performance. It is an operationally simpler substitute for the relatively complex approach of DNS-based redirection, but is far less predictable in terms of routing. The consequences of its use of the Border Gateway Protocol (BGP) and best prac- tices with regard to its configuration have been studied to a certain extent, but the fact that anycast is very situational and unpredictable still causes the need for anycast struc- tures to be analyzed independently. This paper serves as a case study of anycast and will focus on Cloudflare. Cloud- flare is a US company which redistributes their client web- sites and data via their content delivery network (CDN), and they use anycast to route requests to their nearest server site in this network. This makes anycast an essen- tial part of their business, and raises the importance of its analysis. In this paper, Cloudflare’s anycast performance is analyzed, and a judgment on the optimality of their anycast deployment is based on this analysis. The RIPE Atlas network analysis platform is used to gather latency measurements, after which these are processed using the geolocation algorithm iGreedy, as proposed by Cicalese et al. The resulting Cloudflare datacenter locations are used to determine to which extent requesters are served by the optimal datacenter. The conclusion to this is positive: anycast consistently manages to route approximately nine out of ten requests to the optimal Cloudflare datacenter. Keywords Cloudflare, anycast, RIPE Atlas, CDN, performance 1. INTRODUCTION Cloudflare is a content delivery network (CDN) used mainly for its DDoS protection and web optimization features. Cloudflare has multiple server sites across the globe [1], with these sites distributing copies of the same data to clients in order to decrease latency and increase availabil- ity [16]. Cloudflare utilizes anycast, a routing method re- siding in the network layer, to route client traffic to a Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy oth- erwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. 27 th Twente Student Conference on IT July 7 th , 2017, Enschede, The Netherlands. Copyright 2017, University of Twente, Faculty of Electrical Engineer- ing, Mathematics and Computer Science. certain Cloudflare-owned anycast site. Anycast, in turn, uses the exterior gateway protocol BGP (Border Gateway Protocol) to map clients to anycast sites, defining each site’s anycast catchment. This should result in clients be- ing routed to the closest anycast site [5]. However, in practice it is not uncommon for requests to be redirected to a less than optimal site in terms of latency [8], which ex- acerbates when the number of anycast sites increase (and therefore the chance of a ’mishit’, caused by policy limi- tations) [12, 6]. These factors aside, BGP aims to select the closest datacenter based on the amount of AS-hops, meaning the size of ASes a requests transits to reach a certain datacenter greatly influences the eligibility of that datacenter [13]. Anycast performance also depends on the manner with which it is deployed; when deployed in an ad-hoc manner, as opposed to a carefully planned deploy- ment, anycast is known to provide poor latency and poor failover rerouting [6, 7]. An alternative to anycast exists in the form of DNS-based redirection, but similar problems exist for this technique as well [14]. Besides, consider- able investments in infrastructure and operations, and a complex global traffic manager, are required of DNS-based redirection is to be used [8]. This paper examines the optimality of anycast’s routing of traffic requesting Cloudflare-hosted websites or data. The target audience comprises CDN operators and (any- cast) network engineers and/or analysts, but also website owners comparing CDNs and fellow computer science stu- dents. Anycast has already been studied to a reasonable extent [12, 5, 6, 7, 10, 9, 15, 11]. This paper is therefore presented as a practical case study, focused on judging Cloudflare’s anycast deployment. It is not the first time an anycast CDN has been analyzed: Calder et al. ana- lyzed Bing’s anycast configuration, which performed rel- atively good, but concluded that a fifth of the requests were directed to a suboptimal server site, in which case performance suffers significantly [8]. However, Calder et al. emphasize that anycast performance is very situational and specific to the CDN at hand. Besides, the current number of Cloudflare’s server sites (115) is approximately three times that of Bing at the time of their study (”a few dozen”). For these reasons, combined with the fact that anycast is an essential part of Cloudflare’s business, it important that Cloudflare’s anycast performance be analyzed inde- pendently. This paper answers the question ”How well does Cloudflare’s anycast perform in practice?” , and this is done using the RIPE Atlas network analysis platform to collect latency measurements, after which these are processed using Cicalese et al.’s anycast enumeration and geolocation methodology [10], dubbed iGreedy. This, in turn, will yield a list of Cloudflare datacenters, on which latency measurements and comparisons can be executed. 1

Upload: vannga

Post on 07-Mar-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Analyzing the Performance of Cloudflare’s Anycast CDN,a Case Study

Emiel BosUniversity of Twente

P.O. Box 217, 7500AE EnschedeThe Netherlands

[email protected]

2nd Author2nd author’s affiliation

1st line of address2nd line of address

2nd author’s email address

3rd Author3rd author’s affiliation

1st line of address2nd line of address

3rd author’s email address

ABSTRACTAnycast is a network routing mechanism which is sup-posed to route traffic to the topologically nearest serverout of a group of potential servers announcing the sameIP address, and is widely used for its DDoS mitigation andincrease of service availability and network performance.It is an operationally simpler substitute for the relativelycomplex approach of DNS-based redirection, but is far lesspredictable in terms of routing. The consequences of itsuse of the Border Gateway Protocol (BGP) and best prac-tices with regard to its configuration have been studied to acertain extent, but the fact that anycast is very situationaland unpredictable still causes the need for anycast struc-tures to be analyzed independently. This paper serves as acase study of anycast and will focus on Cloudflare. Cloud-flare is a US company which redistributes their client web-sites and data via their content delivery network (CDN),and they use anycast to route requests to their nearestserver site in this network. This makes anycast an essen-tial part of their business, and raises the importance of itsanalysis. In this paper, Cloudflare’s anycast performanceis analyzed, and a judgment on the optimality of theiranycast deployment is based on this analysis. The RIPEAtlas network analysis platform is used to gather latencymeasurements, after which these are processed using thegeolocation algorithm iGreedy, as proposed by Cicalese etal. The resulting Cloudflare datacenter locations are usedto determine to which extent requesters are served by theoptimal datacenter. The conclusion to this is positive:anycast consistently manages to route approximately nineout of ten requests to the optimal Cloudflare datacenter.

KeywordsCloudflare, anycast, RIPE Atlas, CDN, performance

1. INTRODUCTIONCloudflare is a content delivery network (CDN) used mainlyfor its DDoS protection and web optimization features.Cloudflare has multiple server sites across the globe [1],with these sites distributing copies of the same data toclients in order to decrease latency and increase availabil-ity [16]. Cloudflare utilizes anycast, a routing method re-siding in the network layer, to route client traffic to a

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copy oth-erwise, or republish, to post on servers or to redistribute to lists, requiresprior specific permission and/or a fee.27th Twente Student Conference on IT July 7th, 2017, Enschede, TheNetherlands.Copyright 2017, University of Twente, Faculty of Electrical Engineer-ing, Mathematics and Computer Science.

certain Cloudflare-owned anycast site. Anycast, in turn,uses the exterior gateway protocol BGP (Border GatewayProtocol) to map clients to anycast sites, defining eachsite’s anycast catchment. This should result in clients be-ing routed to the closest anycast site [5]. However, inpractice it is not uncommon for requests to be redirectedto a less than optimal site in terms of latency [8], which ex-acerbates when the number of anycast sites increase (andtherefore the chance of a ’mishit’, caused by policy limi-tations) [12, 6]. These factors aside, BGP aims to selectthe closest datacenter based on the amount of AS-hops,meaning the size of ASes a requests transits to reach acertain datacenter greatly influences the eligibility of thatdatacenter [13]. Anycast performance also depends on themanner with which it is deployed; when deployed in anad-hoc manner, as opposed to a carefully planned deploy-ment, anycast is known to provide poor latency and poorfailover rerouting [6, 7]. An alternative to anycast exists inthe form of DNS-based redirection, but similar problemsexist for this technique as well [14]. Besides, consider-able investments in infrastructure and operations, and acomplex global traffic manager, are required of DNS-basedredirection is to be used [8].

This paper examines the optimality of anycast’s routingof traffic requesting Cloudflare-hosted websites or data.The target audience comprises CDN operators and (any-cast) network engineers and/or analysts, but also websiteowners comparing CDNs and fellow computer science stu-dents. Anycast has already been studied to a reasonableextent [12, 5, 6, 7, 10, 9, 15, 11]. This paper is thereforepresented as a practical case study, focused on judgingCloudflare’s anycast deployment. It is not the first timean anycast CDN has been analyzed: Calder et al. ana-lyzed Bing’s anycast configuration, which performed rel-atively good, but concluded that a fifth of the requestswere directed to a suboptimal server site, in which caseperformance suffers significantly [8]. However, Calder etal. emphasize that anycast performance is very situationaland specific to the CDN at hand. Besides, the currentnumber of Cloudflare’s server sites (115) is approximatelythree times that of Bing at the time of their study (”a fewdozen”).

For these reasons, combined with the fact that anycastis an essential part of Cloudflare’s business, it importantthat Cloudflare’s anycast performance be analyzed inde-pendently. This paper answers the question ”How welldoes Cloudflare’s anycast perform in practice?”, and thisis done using the RIPE Atlas network analysis platformto collect latency measurements, after which these areprocessed using Cicalese et al.’s anycast enumeration andgeolocation methodology [10], dubbed iGreedy. This, inturn, will yield a list of Cloudflare datacenters, on whichlatency measurements and comparisons can be executed.

1

In the next section, the research goal of this paper is brieflydescribed and divided into three subquestions. Section 3gives a background and explains the concept treated inthis paper. In section 4, related work is mentioned andcontrasted to this research. The employed methodology, orresearch methods, are elucidated. Section 6 consecutivelyanswers the three research questions, after which thesefindings are summarized in section 7. References are foundin section 9, followed by appendices.

2. RESEARCH GOALIn assessing the way Cloudflare has configured its anycastinfrastructure, the question ”How well does Cloudflare’sanycast perform in practice?” will be the main researchquestion. In order to answer this question, I will formu-late several subquestions. The first such subquestion willconcern itself with Cloudflare’s anycast performance at aspecific time: ”How well does anycast distribute requestsacross Cloudflare’s servers?” Secondly, multiple measure-ment sessions will be executed over an extended periodof time, in order to establish to what extent this perfor-mance changes over time. I will formulate this questionas ”To what extent does Cloudflare’s request distributionchange over time?” Lastly, these findings are then com-pared to similar CDNs with a third subquestion: ”Howdoes Cloudflare’s anycast performance compare to similarCDNs?” Once all subquestions are answered, it should bereasonable to formulate a conclusion on the question ”Howwell does Cloudflare’s anycast perform in practice?”

3. BACKGROUNDA content delivery network (CDN), sometimes also knowsas a content distribution network, is a network of proxyserver sites geographically distributed in such a way thatit can deliver content to requesters with high performanceand high availability. This is because server load is dis-tributed among multiple servers. Besides, these can beplaced closer to the customer. Nowadays, there are nu-merous CDNs on the market, one of which is Cloudflare.Cloudflare serves thousands of websites by redistributingthem via their CDN, which at the time of writing consistsof 115 datacenters. A website wanting to utilize Cloud-flare has to switch their authoritative DNS server withDNS servers owned by Cloudflare. Now, instead of a clientgetting an IP address — from its DNS resolver — thatpoints to one fixed server on which a website is originallyhosted, it will receive one of Cloudflare’s anycast addresses(as explained in the next paragraph), which points to aCloudflare datacenter that is likely to be much closer tothe customer than the original server. When a Cloudflaredatacenter receives a request, it will analyze the requestand determine whether it is benign or not, after which itwill first check the local cache. If the requested object isnot in the cache, or if the object is out of date, Cloudflarewill retrieve the object from the origin server. Cloudflareclaims to get ”premium routes from our data centers backto most places on the internet” because of their scale. As aconsequence, the number of hops a request makes is oftenless than the number of hops the request would have madeif it had to go to the origin server, even if Cloudflare hasto retrieve content from the origin server itself [17].

Unicast is the most common routing method in the inter-net, which sends data to a single destination identified bya unique IP address. Anycast, the subject of this paper,is another routing method, in which multiple destinationscan be identified by the same IP address, and in whichdata should ideally be sent to the topologically nearest

one. In this way, server load can be distributed acrossmultiple servers at different locations, and more clientscan be served closer by, decreasing the amount of hopsand latency. Anycast uses the Border Gateway Protocol(BGP) for the actual routing and choice of destinationnode. BGP is the standard exterior gateway protocol,which is used to exchange routing and reachability infor-mation between autonomous systems (as opposed to aninterior gateway protocol, which is means for exchangingrouting information within an autonomous system). Anautonomous system is a network as governed by some en-tity or organization.

4. RELATED WORKSchmidt et al. investigated the optimal number of anycastsites for good performance, but concluded that geographi-cal distribution is more important than a large number ofsites [12]. Alzoubi et al. re-evaluated IP Anycast for CDNsby proposing a load-aware IP Anycast CDN architecture[4]. Cicalese et al. provide a first look at the traffic ofanycast-enabled content delivery networks and discoveredthat routes to anycast sites are relatively stable, with fewchanges over time [9]. Cicalese et al. also developed theanycast site enumeration and geolocation method utilizedin this paper [10], upon which they expand with software,datasets and validations in [11]. Flavel et al. propose Fas-tRoute, a scalable load-aware anycast routing architecturefor modern CDNs [9], while De Vries et al. propose amethod of accurately mapping anycast catchments namedVerfploeter [13].

Calder et al. also analyze the performance of anycast inthe context of a CDN [8], namely Bing, in a paper which isperhaps the most similar to this paper. They state that,”like any such study, our specific conclusions are closelytied to the current front-end deployment of the CDN wemeasure.” Besides, at the time of writing of [8], Calder etal. assess Bing to operate ”a few dozens” server locations.As of June 24th, 2017, Cloudflare operates 115 datacen-ters (at the time of writing of [8], they operated a mere43 locations). This increase of CDN scale, paired withthe fact that anycast’s behavior is very situational andunpredictable, are motivations for this study. Cloudflareis increasingly becoming a key player in the CDN mar-ket, and analyzing their particular anycast performance israther essential. Also, a different approach is being taken;as [8] was written and supported by the company owningthe CDN under scrutiny, they were able to assign unicastIP addresses to the server sites, and compare anycast la-tencies with the lowest unicast latencies. However, sinceCloudflare either has no unicast addresses mapped to theirserver sites or has not disclosed them, instead the IP ad-dress of the router just before a reached anycast site willbe used for this research.

5. RESEARCH METHODSNetwork performance can be defined with respect to differ-ent metrics, the most notable being latency and through-put.1 This paper will employ latency as a definition ofperformance. Anycast is mostly, or perhaps almost ex-clusively, used for DNS and HTTP services, two latency-sensitive protocols to which throughput has little essence,hence the focus on latency.

1Depending on the context and how broadly the term ’net-work performance’ is defined, metrics like service avail-ability, hop amount, geographical distance and/or securitycould also be considered.

2

Figure 1. Methodology workflow by Cicalese et al.Figure reprinted from [10] with permission fromthe authors.

In answering the first subquestion, ”How well does any-cast distribute requests across Cloudflare’s servers?”, twomethodologies are employed: Schmidt et al.’s methodol-ogy of comparing latencies [12], and iGreedy, proposedby Cicalese et al., an anycast detection, enumeration andgeolocation method [10]. The former compares the round-trip time (RTT) of the BGP assigned anycast site to theanycast site with the lowest latency, found by sendingpings to all Cloudflare datacenters and picking the onewith the lowest latency. If there is a significant difference,it can be concluded that the optimal anycast site has notbeen chosen. However, Cloudflare’s server sites cannotbe contacted individually, since they all share one anycastaddress (or rather, a range of IP addresses [2]), and Cloud-flare either has not mapped unicast addresses to their sitesat all, or has not disclosed them. For this research, the IPaddress of preceding routers, found by taking the penul-timate hop in a traceroute, are used as a substitute. Inorder to execute these traceroutes, probes of which theanycast datacenter is known are required. This is whereiGreedy comes into play. Latency measurements from allRIPE Atlas probes are fed to iGreedy, which will output aconservatively lower-bounded set (avoiding false positives)of Cloudflare datacenter locations. This procedure will besummarized below, and is illustrated in figure 1.

A geographical ’disc’ is centered around each of the mea-suring probes, with the radii equaling half the RTT ofthat probes times the speed of light. This implies thatthe anycast instance contacted by the probe is located in-side the disc. Then, in an iterative manner, these discsare processed by an enumeration stage and subsequentlya geolocation stage, after which the results are optionallyfed back to the enumeration stage. The more iterations,the more accurate the estimations. The enumeration stagetakes all the discs from the measuring stage and treats itas a Maximum Independent Set problem, meaning it picksas many non-overlapping discs as possible. (This problemis NP-hard, meaning a brute-force algorithm will take ex-ponentially longer as the number of latency measurementsgrows.) In the geolocation stage, the most probable cityin which the anycast site is located is estimated with thefollowing formula:

pi = αci∑j cj

+ (1 − α)d(p, t) − d(p,Ai∑j d(p, t) − d(p,Aj

in which the first term is the ratio of a certain disc-basedcity’s population to the total population of all cities in thedisc, the second term is the fraction of the distance fromthat city to the disc border compared to the sum of allthese distances of all the disc’s cities, and ’a’ is a fine-tuning variable. Cicalese et al.’s workflow lower boundsthe total number of located anycast site’s, since two over-lapping discs can contact different sites, yet the enumer-ation stage only selects one of the discs for further con-sideration, because of the modeling after the Maximum

Independent Set problem. The iteration stage will try toameliorate this by moving discs in which a city has beengeolocated to be centered around this city, with a new,arbitrarily small radius. This will reduce the chance ofoverlap, and previously discarded discs can now be recon-sidered.

This whole methodology is protocol-agnostic, and meantfor situations in which nothing is known about both theamount and locations of anycast sites of a certain service.However, Cloudflare discloses the cities in which they op-erate anycast sites [2]. This allows checking whether sitesfound by iGreedy correspond to cities in which Cloudflarehas a datacenter. Ultimately, iGreedy outputs a set of dat-acenter locations mapped to to the RIPE Atlas probes thatmanaged to find these datacenters. By sending traceroutesfrom these probes to their respective datacenter, the IPaddress of the penultimate router can be fetched, whichwill act as a replacement of an unicast address of thatdatacenter.

The first subquestion can now me answered by comparinglatencies from randomly selected probes to both their de-fault anycast IP address and to the lowest latency of alllatencies to IP addresses of penultimate routers of data-centers.

By repeating the above measurements at several pointsin time, I can answer the second subquestion: ”To whatextent does Cloudflare’s request distribution change overtime?”, and by executed the same methodology for otherCDNs, the third subquestion, ”How does Cloudflare’s any-cast performance compare to similar CDNs?”, can be an-swered.

6. RESEARCH RESULTS6.1 How well does anycast distribute requests

across Cloudflare’s servers?For the sake of completeness, two pings are sent to the de-fault anycast datacenter of a measuring probe: one to theanycast IP address itself and one to the IP address of thepenultimate router preceding this anycast server 2. Astraceroutes meant for originally finding the penultimateIPs of datacenters can originate from any direction rela-tive to the datacenter, the routers corresponding to thesepenultimate IPs could be anywhere around the router. Ameasuring probe located on the opposite side of the dat-acenter — from the probe that found the IP address —will likely have its ping be routed around the datacenter,increasing the latency. Therefore, depending on the angle,centered on the optimal datacenter, between both probes,the results could be skewed either in favour of or againstanycast. Were this angle always small, e.g. < 90 degreesthen taking the penultimate IP address would lead to themost accurate comparison. This is unlikely to be the casethough.

Cloudflare operates 115 datacenters at the time of writingand during the execution of the measurements. However,iGreedy never manages to find more than 86 of those dat-acenters. A reason for this could be RIPE Atlas probesbeing predominantly located in Europe [3], causing non-European datacenters to be harder to detect. However,each of Cloudflare’s datacenters should be contacted by atleast one of the over nine thousand employed RIPE Atlasprobes, as plotted in figure 2. Another reason could be

2As found by taking the IP address from the penultimatehop in a traceroute from the measuring probe to its defaultanycast datacenter.

3

Table 1. Key Measurement DataNo. Date Target IP # % optimal Avg lag (ms) Avg diff. (ms) Avg penalty (ms) Avg latency (ms)

1 June 21st, 2017Anycast 101 85 1.36 -13.40 9.13 16.03Penultimate 80 73 1.31 -10.37 4.75 16.08

2 June 23st, 2017Anycast 98 91 1.48 -13.80 16.11 13.67Penultimate 72 85 1.78 -10.30 11.65 15.88

3 June 24st, 2017Anycast 93 83 1.33 -11.62 7.73 19.28Penultimate 66 74 2.50 -8.39 9.69 19.93

4 June 25st, 2017Anycast 91 84 2.03 -11.46 12.31 18.13Penultimate 74 77 3.71 -8.07 15.98 21.66

5 June 26st, 2017Anycast 100 88 4.32 -7.33 36.01 19.25Penultimate 72 81 6.44 -3.91 33.11 21.41

6 June 27st, 2017Anycast 98 91 1.29 -11.13 14.01 14.58Penultimate 75 77 1.44 -8.88 6.37 14.58

7 June 28st, 2017Anycast 100 83 5.19 -7.07 30.50 26.22Penultimate 76 75 5.38 -5.78 21.51 26.68

µ June 2017Anycast 97 86 2.45 -10.80 18.16 18.18Penultimate 74 77 3.21 -7.98 14.64 19.44Compound 86 82 2.78 -9.59 16.64 18.72

EU June 28st, 2017Anycast 90 82 5.25 -11.44 29.53 37.97Penultimate 71 77 7.72 -4.62 34.27 43.85

EC June 26st, 2017Anycast 101 47 12.19 8.14 22.79 42.96Penultimate 92 74 7.38 -0.03 28.31 33.08

Figure 2. Geograhpical locations of identified Cloudflare datacenters and the probes used in the compar-isons of measurement 1.

4

Measuring probeAnycast IPPenultimate IPIdentifying probe

Figure 3. Illustrating the detour that the ping ofa measuring probe could make.

0 5 10 15 20 25 30

1.36

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 4. Distribution of anycast’s lag comparedto the latency to the optimal datacenter.

that not all of Cloudflare’s datacenters use anycast [DeVries, personal correspondence].

Figure 4 shows the empirical distribution of anycast lags;the time in ms that anycast is slower than the optimaldatacenter. As can be deduced, Cloudflare’s performancefor this measurement is good. Out of the 101 comparisons,anycast managed to find the optimal datacenter — whenpinging the anycast IP — in 86 of the cases. Taking thepenultimate IP as target for the comparand ping, 58 of the80 comparisons have anycast reach the optimal datacenter,a decrease of 13%. What’s more, the penalty incurred fromanycast picking a suboptimal datacenter is a reasonable 9ms. 95% of comparisons record lags under 10 ms, and nocomparison observe a lag over 21 ms, with one outlier of39 ms. The average latency to Cloudflare is 16 ms.

Apart from pings to penultimate IPs performing a littleworse in the lower end of the spectrum, the two kinds oftargets of pings perform very similar, making the choicebetween them insignificant in retrospect. From this ob-servation, it can be deduced that the latency between thelast two hops to Cloudflare are negligible. This could bebecause the penultimate router is likely to be situated inCloudflare datacenters as well [De Vries, personal corre-spondence]. However, some datacenters share the samepenultimate router, even if these routers are geographi-cally very distant. Traceroutes to two datacenters in Eu-rope — Vienna, Austria and Zurich, Switzerland — bothhad 193.203.0.195 as penultimate IP address. Even moreremarkable, the same is true for Colombo, Sri Lanka andFrankfurt, Germany (sharing 80.81.194.180).

6.2 To what extent does Cloudflare’s requestdistribution change over time?

Table 1 lists key data summarizing subsequent measure-ments. For every measurement and for both of the targetIP addresses — the actual anycast address and the preced-ing unicast address — the amount of successful measure-ments is recorded, together with the percentage of thosemeasurements where the anycast latency is less than the

0 5 10 15 20 25 30

2.45

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 5. Distributions of measurement repeti-tions.

latency to the optimal datacenter. The average differenceis calculated as latencyanycast − latencyoptimal, meaninga negative difference implies anycast managed to find theoptimal datacenter. The average penalty incurred from aprobe mishitting the optimal datacenter is averaged overall lags greater than zero. Average latency is the overalllatency to Cloudflare.

The table shows consistently good anycast performance,apart from spikes in average lag and average penalty dur-ing measurement 5 and 7. Nevertheless, the percentage ofthe comparisons in which anycast finds the optimal data-center remains above 80% throughout, averaging at 86%.When targeting the penultimate IP, this figure is some-what lower, beating the optimal datacenter in 77% of thecases. It can be concluded that, during measurement 5 and7, the penalty for mishitting the optimal datacenter wors-ened, while other factors remained the same. A reason forthis could be a relatively high server outage, causing pingsto travel further once they miss their optimal datacenter.

Absolute performance, i.e. average anycast latency, canalso be considered good, although this is obviously sub-jective and dependent on frame of reference. Comparedto an unicast setup, in which internet traffic is frequentlyintercontinental and where latencies incurred from tran-siting the Atlantic Ocean are often over 80 ms [De Vries,personal correspondence], Cloudflare’s average of 18.18 msis objectively good. Even when mishitting, the latency ofa request approximately doubles, which can still be con-sidered reasonable. Besides, this only happens in around14% of the cases.

Figure 5 plots the cumulative distributions of all measure-ments. It is meant as an illustration of the variance of thelags across all measurements. Individual CDF graphs areincluded in Appendix B. As can been seen, temporal dif-ferences in Cloudflare optimality are moderate. All mea-surements observe pings to the penultimate IP performingeither a little worse or similar compared to pings to theanycast IP, reinforcing the observation that the choice be-tween them is rather arbitrary.

Figure 2 shows that the majority of measuring probes arelocated in Europe, which causes Europe’s Cloudflare per-formance to have the most influence on the results. Thisbias towards Europe stems from the fact that the vast ma-jority of RIPE Atlas probes are located in Europe, thusdecreasing the chance for non-European probes to be se-lected. RIPE NCC — and their Atlas platform — arebased in Amsterdam, which would explain the relatively

5

0 5 10 15 20 25 30Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00CD

F

to Cloudflare anycast IPsto Cloudflare penultimate IPsto Edgecast anycast IPsto Edgecast penultimate IPs

Figure 6. Distributions of the non-European mea-surement.

high popularity of probes in Europe. In order to counterthis, a measurement comprising only measuring probesoutside of Europe is performed and listed as ’EU’ in 1.Figure 14 in Appendix B shows its cumulative distribu-tive function. Both show results similar to the other mea-surements, although slightly worse. However, this mea-surement showing more or less similar results significantlyreduces doubt that this European bias would produce un-representative results, as this shows that Cloudflare’s per-formance is approximately equal across the globe.

6.3 How does Cloudflare’s anycast performancecompare to similar CDNs?

Cloudflare is compared to one3 other CDN: Edgecast, whichis now owned by Verizon. The same methodology, as de-scribed in section 5, was employed in measuring Edgecast.

Figure 6 displays the empirical distributive functions ofboth the results of the measurements to Edgecast and mea-surement 4 to Cloudflare, the latter of which is the mostrepresentative of Cloudflare’s performance. When tar-geting anycast IPs, Edgecast performs significantly worsethan Cloudflare, hitting its optimal datacenter only 47out of 101 times. However, strangely enough, Edgecastperforms equally well as Cloudflare when the penultimateIPs are considered. The gap in performance when target-ing penultimate IPs and anycast IPs is significantly largerthan the same difference as observed in the measurementsto Cloudflare. A potential reason for this is Edgecast nothosting penultimate routers in the same datacenters astheir anycast servers, or at least not as close together asCloudflare does. It should be noted that other metrics ofEdgecast’s performance are only slightly worse than thoseof Cloudflare, deviating strongly only in hit/mishit ratioand average lag.

7. CONCLUSIONSThe aim of this work was to answer the question ”Howwell does Cloudflare’s anycast perform in practice?”. Thiswas done using three subquestions. The first subquestion,”How well does anycast distribute requests across Cloud-

3Edgecast was unfortunately the only possible CDN forcomparison. Bing hides its penultimate IP addresses.Akamai owns more datacenters than RIPE Atlas hasprobes, making it impossible to identify all its datacen-ters and to accurately measure its performance. Due tounfortunate circumstances and time constraints, measure-ments directed at Fastly did not succeed. I was unable tofind IP addresses or targets of other CDNs.

flare’s servers?”, concerned itself with how well anycastroutes traffic to nearby Cloudflare datacenters. Measure-ments show that 86% of requests directed at Cloudflareland at an optimal datacenter. The absolute latenciesare mostly below 25 ms and any penalties as a result ofa mishit are manageable, making Cloudflare objectivelybetter than any unicast setup. The second subquestionasked ”To what extent does Cloudflare’s request distribu-tion change over time?”, to which it can be concludedthat performance remains more or less consistent, espe-cially the hit/mishit ratio, both over time and betweencontinents. The third subquestion, ”How does Cloudflare’sanycast performance compare to similar CDNs?” was usedto compare Cloudflare to Edgecast. Edgecast proved toperform significantly worse with respect to anycast opti-mality and average lag. However, when targeting penulti-mate IPs, performance was almost equal.

In conclusion, Cloudflare seems to be well-versed in any-cast, as their anycast performance is arguably more thanadequate in multiple respects. This is to be expected; afterall, anycast is at the core of their business, and essentialfor their financial position. The value of their service isproportional to the difference between their anycast per-formance and unicast performance.

.

8. ACKNOWLEDGMENTSI would like to thank Wouter B. de Vries of the Universityof Twente for financing — in RIPE Atlas credits — thisrather expensive research.

6

9. REFERENCES[1] The Cloudflare global anycast network. Cloudflare.

https://www.cloudflare.com/network/. Accessed:2017-06-02.

[2] Cloudflare IP ranges. Cloudflare.https://www.cloudflare.com/ips/. Accessed:2017-06-02.

[3] Global RIPE Atlas network coverage. RIPE NCC.https://atlas.ripe.net/results/maps/network-coverage/. Accessed:2017-06-22.

[4] H. A. Alzoubi, S. Lee, M. Rabinovich,O. Spatscheck, and J. Van Der Merwe. A practicalarchitecture for an anycast CDN. ACM Trans. Web,5(4):17:1–17:29, October 2011.

[5] H. Ballani and P. Francis. Towards a global IPanycast service. SIGCOMM Comput. Commun.Rev., 35(4):301–312, August 2005.

[6] H. Ballani, P. Francis, and S. Ratnasamy. Ameasurement-based deployment proposal for IPanycast. In Proceedings of the 6th ACM SIGCOMMConference on Internet Measurement, IMC ’06,pages 231–244, New York, NY, USA, 2006. ACM.

[7] R. Bellis. Researching F-root anycast placementusing RIPE Atlas. RIPE NCC.https://labs.ripe.net/Members/ray bellis/researching-f-root-anycast-placement-using-ripe-atlas, October2015. Accessed: 2017-06-02.

[8] M. Calder, A. Flavel, E. Katz-Bassett, R. Mahajan,and J. Padhye. Analyzing the performance of ananycast CDN. In Proceedings of the 2015 InternetMeasurement Conference, IMC ’15, pages 531–537,New York, NY, USA, 2015. ACM.

[9] D. Cicalese, D. Giordano, A. Finamore, M. Mellia,M. M. Munafo, D. Rossi, and D. Joumblatt. A firstlook at anycast CDN traffic. CoRR, 1505.00946,2015.

[10] D. Cicalese, D. Joumblatt, D. Rossi, M. O. Buob,J. Auge, and T. Friedman. A fistful of pings:Accurate and lightweight anycast enumeration andgeolocation. In 2015 IEEE Conference on ComputerCommunications (INFOCOM), pages 2776–2784,April 2015.

[11] D. Cicalese, D. Z. Joumblatt, D. Rossi, M. O. Buob,J. Auge, and T. Friedman. Latency-based anycastgeolocation: Algorithms, software, and data sets.IEEE Journal on Selected Areas inCommunications, 34(6):1889–1903, June 2016.

[12] R. de Oliveira Schmidt, J. Heidemann, and J. H.Kuipers. Anycast Latency: How Many Sites AreEnough?, pages 188–200. Springer InternationalPublishing, Cham, 2017.

[13] W. B. de Vries, R. de O. Schmidt, W. Haraker,J. Heidemann, P.-T. de Boer, and A. Pras.Verfploeter: Broad and load-aware anycast mapping.Technical Report ISI-TR-719, USC/InformationSciences Institute, May 2017.

[14] X. Fan, E. Katz-Bassett, and J. Heidemann.Assessing Affinity Between Users and CDN Sites,pages 95–110. Springer International Publishing,Cham, 2015.

[15] A. Flavel, P. Mani, D. Maltz, N. Holt, J. Liu,Y. Chen, and O. Surmachev. FastRoute: A scalableload-aware anycast routing architecture for modernCDNs. In 12th USENIX Symposium on NetworkedSystems Design and Implementation (NSDI 15),pages 381–394, Oakland, CA, 2015. USENIX

Association.

[16] M. Prince. A brief primer on anycast. CloudflareBlog.https://blog.cloudflare.com/a-brief-anycast-primer/,October 2011. Accessed: 2017-06-02.

[17] M. Prince. How does Cloudflare work? Quora.https://www.quora.com/How-does-Cloudflare-work,September 2012. Accessed: 2017-06-02.

7

APPENDIXA. REPRODUCTION AND DATASETSA.1 CodeFor reproductive purposes, the Python script used hasbeen published.4 The script has to be placed inside theiGreedy installation, and consists of five functions, to beexecuted consecutively. The first function, getAtlasMea-surement(), collects latency measurements from all cur-rently connected RIPE Atlas probes, and stores the JSONresult in the datasets/measurement/ folder. buildMea-

surement(file) then takes this file and converts it to ameasurement file, which can be used as input to iGreedy.iGreedy will output another JSON file, consisting of adictionary of probe-datacenter pairs. Feeding this to thefunction buildProbedict(file) will generate somethingI call a ’probedict’, a dictionary of datacenters — as foundby iGreedy — and the probes which can be used to contactthis datacenter. This probedict also includes coordinatesand reverse geocoded city-level locations for both the iden-tifying probe and its datacenter, and the IP address of thepenultimate router from a traceroute between the respec-tive subjects. This probedict will be copied and used forevery comparison in buildComparisons(file) (99 by de-fault). For each such comparison, a random probe fromthe RIPE Atlas will be picked and used to send pingsto all penultimate IPs in the probedict, and to both itsdefault anycast IP and a penultimate IP as taken fromanother traceroute by the same measuring probe. Thefunction will output a comparisons.json file, which in-cludes all measurements results that were taken and alldata necessary for analysis. My idea of such an analy-sis is implemented in the function buildEnhancedCompar-

isons(file). This function will take the original com-

parisons.json file, delete all non-positive latencies, gen-erate meta-information for all comparisons and the toplevel measurement, and output the definitive dataset, asexplained in the next section.

A.2 DatasetsThe datasets collected are also published.5 Each measure-ment is a dictionary of comparisons, keyed by the ID ofthe probe responsible for the comparison. Each such com-parison, in turn, is a dictionary of latencies (and a lot ofother information) from this probe to all found penulti-mate IP addresses, it’s default anycast IP address and thepenultimate IP address of it’s anycast datacenter. Theselatencies are keyed by the ID of the probe which originallyfound the penultimate IP to which the ping has been sent(which seems a little illogical, but which facilitated myprogramming), except the pings to the default datacenter,which are keyed by ”defaultAnycast” and ”defaultPenul-timate”, respectively. Both the top layer dictionary andeach of the therein nested dictionaries contain a dictio-nary of metadata, as keyed by ”meta”. These containsummarizing data about the the dictionary they belongto, like amounts and averages. Both coordinates and re-verse geocoded locations (on city level) are collected fordatacenters, identifying probes and measuring probes, forreference.

B. INDIVIDUAL CDF GRAPHSIncluded in this section are the individual cumulative dis-tributive functions.

4https://pastebin.com/raw/fyZSQ30n5https://drive.google.com/drive/folders/0B7nl7RpG2ehWdGotS0JiTmU0cWc?usp=sharing

0 5 10 15 20 25 30

1.36

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 7. Distributions of measurement 1.

0 5 10 15 20 25 30

1.48

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 8. Distributions of measurement 2.

0 5 10 15 20 25 30

1.33

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 9. Distributions of measurement 3.

8

0 5 10 15 20 25 30

2.03

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 10. Distributions of measurement 4.

0 5 10 15 20 25 30

4.32

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 11. Distributions of measurement 5.

0 5 10 15 20 25 30

1.29

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 12. Distributions of measurement 6.

0 5 10 15 20 25 30

5.19

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 13. Distributions of measurement 7.

0 5 10 15 20 25 30

5.25

Lag (ms)

0.75

0.80

0.85

0.90

0.95

1.00

CDF

to anycast IPsto penultimate IPsaverage lag to anycast IPsaverage lag to penultimate IPs

Figure 14. Distributions of the non-Europeanmeasurement.

9