an analysis of internet content delivery systems 19 rd november, 2007 youngsub cse, snu

24
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub Kwon @ CSE, SNU

Upload: myles-fields

Post on 19-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

3 Introduction  This paper examines content delivery from the point of view of four content delivery systems  HTTP web traffic  Akamai content delivery network  Kazaa and Gnutella P2P file sharing traffic  Results  Quantify the rapidly increasing importance of new content delivery systems, particularly peer-to-peer networks  Characterize the behavior of these systems from the perspectives of clients, objects, and servers  Derive implications for caching in these systems

TRANSCRIPT

Page 1: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

An Analysis of Internet Content Delivery Systems

19rd November, 2007Youngsub Kwon @ CSE, SNU

Page 2: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

2

Contents Introduction Overview of Content Delivery Systems Methodology High-Level Data Characteristics Detailed Content Delivery Characteristics The Potential Role of Caching in CDNs and P2P Conclusion

Page 3: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

3

Introduction This paper examines content delivery from the

point of view of four content delivery systems HTTP web traffic Akamai content delivery network Kazaa and Gnutella P2P file sharing traffic

Results Quantify the rapidly increasing importance of new content

delivery systems, particularly peer-to-peer networks Characterize the behavior of these systems from the

perspectives of clients, objects, and servers Derive implications for caching in these systems

Page 4: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

4

Overview of Content Delivery Systems WWW

Using the HTTP protocol (Consistency management) Simple architecture (Server/Client) most web objects are small(5~10KB) Objects are accessed with Zipf popularity distribution The number of web objects is enormous and rapidly growing

Page 5: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

5

Overview of Content Delivery Systems Content Delivery Networks (CDNS)

Collections of servers located strategically across the wide-area Internet

Content is replicated across the wide area. High availability CDN have server in ISP points of presence Clients can access topologically nearby replicas with low

latency CDNs reduce average downloaded response times,

but DNS redirection causes overhead Peer-to-Peer Systems (P2P)

Peers collaborate to form a distributed system for the purpose of exchanging content

Most content-serving hosts are run by end-user Low availability, low capacity network connections

Page 6: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

6

Methodology Use passive network monitoring to collect

traces of traffic Network Composition

UW(=University of Washington) connects to its ISPs via two border routers - inbound, outbound traffic

Two routers are Fully connected to four switches Each switches has a monitoring port that is used to copies

packets to monitoring host Tracing Infrastructure

Software - 26,000 lines of codes Hardware - dual-processor Dell Precision Workstation 530

with 2.0Ghz Pentium III Xeon CPUs FreeBSD 4.5

Page 7: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

7

Methodology Distinguishing Traffic Types

Two types of traffic - HTTP traffic, non-HTTP traffic HTTP Traffic - WWW, Akamai, Kazaa, Gnutella Non-HTTP Traffic - Kazaa, Gnutella search traffic

Akamai – Port 80, 8080, 443 that is server by Akamai server WWW - Port 80, 8080, 443 that is not server by Akamai

server Gnutella – Ports 6346 or 6347 – includes file transfer, but

excludes Search and control traffic Kazaa – Port 1214 – includes file transfer, but excludes

Search and control traffic

Page 8: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

8

High-Level Data Characteristics TCP Bandwidth

All systems show a typical diurnal cycle Akamai - 0.2% Gnutella - 6.04% WWW traffic - 14.3% of TCP traffic Kazaa - 36.99% of TCP bytes

Page 9: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

9

High-Level Data Characteristics UW Client and server TCP bandwidth

Figure (a) – Inbound Data BWs WWW peaking in the middle of the day Kazza peaking late at night

Figure (b) – Outbound Data BWs Peak Kazza BW dominates WWW by a factor of 3

Page 10: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

10

High-Level Data Characteristics Content types downloaded by UW clients

GIF & JPEG images account for 42% of downloads, account for only 16.3% of the bytes transferred

Compares with measurements from 1999 study HTML traffic : -43%, GIF&JPG traffic : -59% AVI&MPG traffic : 400%, MP3 traffic 300%

Page 11: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

11

High-Level Data Characteristics Summary

The balance of HTTP traffic has changed dramatically over the last server years P2P traffic overtaking WWW traffic as the largest

contributor to HTTP bytes transferred Although UW is large publisher of web documents, P2P

traffic makes the University an even larger exporter of data The mixture of object types downloaded by UW clients has

changed

Page 12: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

12

Detailed Content Delivery Characteristics Objects

Object size: P2P > WWW & Akamai Top bandwidth consuming Objects

For Gnutella, we see that a relatively large number of objects account for a large portion of the transferred bytes

Page 13: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

13

Detailed Content Delivery Characteristics Objects – Top 10 bandwidth consuming objects

WWW – The top 10 objects are a mix of extremely small objects

Akamai – 8 out of the top 10 objects are larger and unpopular

Kazaa – Export objects are larger than import objects

Page 14: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

14

Detailed Content Delivery Characteristics Objects – Downloaded bytes by object type

Page 15: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

15

Detailed Content Delivery Characteristics Clients - Top UW bandwidth consuming clients

Figure (a) – Top Bandwidth Consuming UW Clients WWW - Top 200 clients (0.5%) 13% of WWW traffic

Kazza - Top 200 clients (4%) 50% of Kazza traffic Figure (b) – Top Bandwidth Consuming UW Servers

Kazza: 200 clients 20% of the total HTTP bytes downloaded (worst offender)

Page 16: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

16

Detailed Content Delivery Characteristics Clients - Request rates over time

Page 17: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

17

Detailed Content Delivery Characteristics Servers-Top UW-internal bandwidth producing

servers Figure (a) – Top Bandwidth Consuming UW Servers

Gnutella: All of the the bytes first 10 servers, WWW: steep curveKazza: 80% of the bytes top 334 servers

Figure (b) WWW: 20 servers 20% of all HTTP bytes output

Kazza: 170 server 50% of all HTTP bytes output

Page 18: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

18

Detailed Content Delivery Characteristics Servers-The UW-external bandwidth

producing servers Figure (a)

WWW: 938 external servers 50% of the bytesKazza: 600 external servers 26% of the bytes

Figure (b) Kazza: Top 500 external Kazza peers 10% of the bytes

WWW: Top 500 servers 22% of the bytes

Page 19: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

19

Detailed Content Delivery Characteristics Servers

The response codes returned by external servers in each content delivery system

Figure (a) Akamai and the WWW: 70% success, P2P: Less than 20% success

Figure (b) shows that nearly all HTTP bytes are for useful content. Overhead of rejected requests is small compared to the amount of useful

data transferred.

Page 20: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

20

Detailed Content Delivery Characteristics Scalability of P2P Systems

Whether P2P Systems like Kazaa can scale in environments such as the univ. ?

Every peer in P2P system consumes bandwidth in both directions Each new P2P client added becomes a server for the entire P2P

structure Kazaa object is huge, so a small number of peers can consume

an enormous amount of total net. Bandwidth The bandwidth cost of each P2P peer is 90 times that of the

web client ! It seems questionable whether any organization can

supports a service with these characteristics

Page 21: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

21

Detailed Content Delivery Characteristics Summary

Peer-to-peer, which now accounts for over three quarters of HTTP traffic

A small number of P2P users are consuming a disproportionately high fraction of bandwidth

While the P2P request rate is quite low, the transfer last long

While the design of P2P overlay structures focuses on spreading the workload for scalability, our measurements show that a small number of servers are taking the majority of the burden

Page 22: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

22

The Potential Role of Caching in CDNs Akamai requests achieve an 88% ideal hit rate and a

50% practical hit rate, noticeably higher than www requests (77% and 36%) Our analysis shows that akamai requests are more skewed

towards the most popular documents than are WWW requests We know that most bytes fetched from Akamai are from images

and videos This implies that much of Akamai's content is in fact static and

could be cached We would expect that widely deployed proxy caches

would significantly reduce the need for a separate content delivery network

Page 23: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

23

The Potential Role of Caching in P2P The potential impact of caching in P2P systems may

exceed the benefits seen in the web Inbound cache byte hit rate = 35%, Outbound cache

byte hit rate = 85% Hit rate increases with client population size for

outbound traffic. (1000 client - 40%, 500,000 client - 85%)

Reverse P2P cache saves the most bandwidth

Page 24: An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU

24

Conclusion P2P traffic now accounts for the majority of

HTTP bytes transferred P2P documents are three orders of magnitude

larger than web objects A small number of extremely large objects

account for an enormous fraction of observed P2P traffic

A small number of clients and servers are responsible for the majority of the traffic we saw in the P2P systems

Each P2P client creates a significant bandwidth load in both directions