an analysis of internet content delivery systems 19 rd november, 2007 youngsub cse, snu
DESCRIPTION
3 Introduction This paper examines content delivery from the point of view of four content delivery systems HTTP web traffic Akamai content delivery network Kazaa and Gnutella P2P file sharing traffic Results Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks Characterize the behavior of these systems from the perspectives of clients, objects, and servers Derive implications for caching in these systemsTRANSCRIPT
An Analysis of Internet Content Delivery Systems
19rd November, 2007Youngsub Kwon @ CSE, SNU
2
Contents Introduction Overview of Content Delivery Systems Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics The Potential Role of Caching in CDNs and P2P Conclusion
3
Introduction This paper examines content delivery from the
point of view of four content delivery systems HTTP web traffic Akamai content delivery network Kazaa and Gnutella P2P file sharing traffic
Results Quantify the rapidly increasing importance of new content
delivery systems, particularly peer-to-peer networks Characterize the behavior of these systems from the
perspectives of clients, objects, and servers Derive implications for caching in these systems
4
Overview of Content Delivery Systems WWW
Using the HTTP protocol (Consistency management) Simple architecture (Server/Client) most web objects are small(5~10KB) Objects are accessed with Zipf popularity distribution The number of web objects is enormous and rapidly growing
5
Overview of Content Delivery Systems Content Delivery Networks (CDNS)
Collections of servers located strategically across the wide-area Internet
Content is replicated across the wide area. High availability CDN have server in ISP points of presence Clients can access topologically nearby replicas with low
latency CDNs reduce average downloaded response times,
but DNS redirection causes overhead Peer-to-Peer Systems (P2P)
Peers collaborate to form a distributed system for the purpose of exchanging content
Most content-serving hosts are run by end-user Low availability, low capacity network connections
6
Methodology Use passive network monitoring to collect
traces of traffic Network Composition
UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic
Two routers are Fully connected to four switches Each switches has a monitoring port that is used to copies
packets to monitoring host Tracing Infrastructure
Software - 26,000 lines of codes Hardware - dual-processor Dell Precision Workstation 530
with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5
7
Methodology Distinguishing Traffic Types
Two types of traffic - HTTP traffic, non-HTTP traffic HTTP Traffic - WWW, Akamai, Kazaa, Gnutella Non-HTTP Traffic - Kazaa, Gnutella search traffic
Akamai – Port 80, 8080, 443 that is server by Akamai server WWW - Port 80, 8080, 443 that is not server by Akamai
server Gnutella – Ports 6346 or 6347 – includes file transfer, but
excludes Search and control traffic Kazaa – Port 1214 – includes file transfer, but excludes
Search and control traffic
8
High-Level Data Characteristics TCP Bandwidth
All systems show a typical diurnal cycle Akamai - 0.2% Gnutella - 6.04% WWW traffic - 14.3% of TCP traffic Kazaa - 36.99% of TCP bytes
9
High-Level Data Characteristics UW Client and server TCP bandwidth
Figure (a) – Inbound Data BWs WWW peaking in the middle of the day Kazza peaking late at night
Figure (b) – Outbound Data BWs Peak Kazza BW dominates WWW by a factor of 3
10
High-Level Data Characteristics Content types downloaded by UW clients
GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred
Compares with measurements from 1999 study HTML traffic : -43%, GIF&JPG traffic : -59% AVI&MPG traffic : 400%, MP3 traffic 300%
11
High-Level Data Characteristics Summary
The balance of HTTP traffic has changed dramatically over the last server years P2P traffic overtaking WWW traffic as the largest
contributor to HTTP bytes transferred Although UW is large publisher of web documents, P2P
traffic makes the University an even larger exporter of data The mixture of object types downloaded by UW clients has
changed
12
Detailed Content Delivery Characteristics Objects
Object size: P2P > WWW & Akamai Top bandwidth consuming Objects
For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes
13
Detailed Content Delivery Characteristics Objects – Top 10 bandwidth consuming objects
WWW – The top 10 objects are a mix of extremely small objects
Akamai – 8 out of the top 10 objects are larger and unpopular
Kazaa – Export objects are larger than import objects
14
Detailed Content Delivery Characteristics Objects – Downloaded bytes by object type
15
Detailed Content Delivery Characteristics Clients - Top UW bandwidth consuming clients
Figure (a) – Top Bandwidth Consuming UW Clients WWW - Top 200 clients (0.5%) 13% of WWW traffic
Kazza - Top 200 clients (4%) 50% of Kazza traffic Figure (b) – Top Bandwidth Consuming UW Servers
Kazza: 200 clients 20% of the total HTTP bytes downloaded (worst offender)
16
Detailed Content Delivery Characteristics Clients - Request rates over time
17
Detailed Content Delivery Characteristics Servers-Top UW-internal bandwidth producing
servers Figure (a) – Top Bandwidth Consuming UW Servers
Gnutella: All of the the bytes first 10 servers, WWW: steep curveKazza: 80% of the bytes top 334 servers
Figure (b) WWW: 20 servers 20% of all HTTP bytes output
Kazza: 170 server 50% of all HTTP bytes output
18
Detailed Content Delivery Characteristics Servers-The UW-external bandwidth
producing servers Figure (a)
WWW: 938 external servers 50% of the bytesKazza: 600 external servers 26% of the bytes
Figure (b) Kazza: Top 500 external Kazza peers 10% of the bytes
WWW: Top 500 servers 22% of the bytes
19
Detailed Content Delivery Characteristics Servers
The response codes returned by external servers in each content delivery system
Figure (a) Akamai and the WWW: 70% success, P2P: Less than 20% success
Figure (b) shows that nearly all HTTP bytes are for useful content. Overhead of rejected requests is small compared to the amount of useful
data transferred.
20
Detailed Content Delivery Characteristics Scalability of P2P Systems
Whether P2P Systems like Kazaa can scale in environments such as the univ. ?
Every peer in P2P system consumes bandwidth in both directions Each new P2P client added becomes a server for the entire P2P
structure Kazaa object is huge, so a small number of peers can consume
an enormous amount of total net. Bandwidth The bandwidth cost of each P2P peer is 90 times that of the
web client ! It seems questionable whether any organization can
supports a service with these characteristics
21
Detailed Content Delivery Characteristics Summary
Peer-to-peer, which now accounts for over three quarters of HTTP traffic
A small number of P2P users are consuming a disproportionately high fraction of bandwidth
While the P2P request rate is quite low, the transfer last long
While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden
22
The Potential Role of Caching in CDNs Akamai requests achieve an 88% ideal hit rate and a
50% practical hit rate, noticeably higher than www requests (77% and 36%) Our analysis shows that akamai requests are more skewed
towards the most popular documents than are WWW requests We know that most bytes fetched from Akamai are from images
and videos This implies that much of Akamai's content is in fact static and
could be cached We would expect that widely deployed proxy caches
would significantly reduce the need for a separate content delivery network
23
The Potential Role of Caching in P2P The potential impact of caching in P2P systems may
exceed the benefits seen in the web Inbound cache byte hit rate = 35%, Outbound cache
byte hit rate = 85% Hit rate increases with client population size for
outbound traffic. (1000 client - 40%, 500,000 client - 85%)
Reverse P2P cache saves the most bandwidth
24
Conclusion P2P traffic now accounts for the majority of
HTTP bytes transferred P2P documents are three orders of magnitude
larger than web objects A small number of extremely large objects
account for an enormous fraction of observed P2P traffic
A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems
Each P2P client creates a significant bandwidth load in both directions