p2p networking and content distribution march 28, 2013 2: application layer1
TRANSCRIPT
2: Application Layer 1
P2P Networking and Content Distribution
March 28, 2013
2: Application Layer 2
Announcements
H/W due today (Calendar, Packet pair) Calendar app 1 week extension is possible
(but w/ 10% point deduction) Meeting w/ project mentors by Monday Project plan presentation
Introduction/background Problem definition (or research questions) Related work (no need to be complete) Approach (+supporting materials) Plans (including refining research questions
+ experimenting about ideas)
2: Application Layer 3
Reviews
Network app: client-server, p2p, hybrid Programming: socket Addressing issues Transport layer vs. service requirements TCP vs. UDP (differences) HTTP: persistent vs. non-persistent HTTP: cookies DNS: distributed, hierarchical DB DNS name hierarchy vs. Internet's topology DNS resolution: iterative vs. recursive
2: Application Layer 4
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
2: Application Layer 5
Pure P2P architecture
no always-on server arbitrary end systems
directly communicate peers are
intermittently connected and change IP addresses
Three topics: File distribution Searching for
information Case Study: Skype
peer-peer
2: Application Layer 6
File Distribution: Server-Client vs P2PQuestion : How much time to distribute file
from one server to N peers?
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
File, size F
us: server upload bandwidth
ui: peer i upload bandwidth
di: peer i download bandwidth
2: Application Layer 7
File distribution time: server-client
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server
sequentially sends N copies: NF/us time
client i takes F/di
time to download
increases linearly w.r.t. N (for large N)
= dcs = max {NF/us, F/min(di) }i
Time to distribute F to N clients using
client/server approach
2: Application Layer 8
File distribution time: P2P
us
u2d1 d2u1
uN
dN
Server
Network (with abundant bandwidth)
F server must send one
copy: F/us time client i takes F/di time
to download NF bits must be
downloaded (aggregate) fastest possible upload rate: us + Sui
dP2P = max { F/us, F/min(di) , NF/(us + Sui) }i
2: Application Layer 9
0
0.5
1
1.5
2
2.5
3
3.5
0 5 10 15 20 25 30 35
Min
imu
m D
istr
ibu
tion
Tim
e
N
P2P
Client-Server
Server-client vs. P2P: example
Client upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us
Client server ~ NF/us vs. P2P ~ NF/(us + Sui)
2: Application Layer 10
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
P2P content distribution issues Issues
Group management and data search Reliable and efficient file exchange Security/privacy/anonymity/trust
Approaches for group management and data search (i.e., who has what?) Centralized (e.g., BitTorrent tracker) Unstructured (e.g., Gnutella) Structured (Distributed Hash Tables [DHT])
2: Application Layer 11
Centralized model (Napster)
original “Napster” design
1) when peer connects, it informs central server: IP address content
2) Alice queries for “Hey Jude”; server notifies that Bob has the file..
3) Alice requests file from Bob
centralizeddirectory server
peers
Alice
Bob
1
1
1
12
3
2: Application Layer 12
Q: “Hey Jude”A: Bob has it
Centralized modelBob Alice
JaneJudy
file transfer is decentralized, but locating content is highly centralized
2: Application Layer 13
Centralized model Benefits:
Low per-node state Limited bandwidth usage Short search time High success rate Fault tolerant
Drawbacks: Single point of failure Limited scale Possibly unbalanced load
Bob Alice
JaneJudy
2: Application Layer 14
2: Application Layer 15
File distribution: BitTorrent
tracker: tracks peers participating in torrent
torrent: group of peers exchanging chunks of a file
obtain a listof peers
trading chunks
peer
P2P file distribution
2: Application Layer 16
BitTorrent (1)
file divided into 256KB chunks. peer joining torrent:
has no chunks, but will accumulate them over time
registers with a tracker to get list of peers, connects to subset of peers (“neighbors”)
while downloading, peer uploads chunks to other peers.
peers may come online and go offline once peer has entire file, it may (selfishly) leave
or (altruistically) remain
2: Application Layer 17
BitTorrent (2)
Pulling Chunks at any given time,
different peers have different subsets of file chunks
periodically, a peer (Alice) asks each neighbor for a list of chunks that it has.
Alice sends requests for her missing chunks rarest first
Sending Chunks: tit-for-tat Alice sends chunks to
four neighbors currently sending her chunks at the highest rate re-evaluate top 4
every 10 secs every 30 secs: randomly
select another peer, starts sending chunks newly chosen peer
may join top 4 “optimistically
unchoke”
2: Application Layer 18
BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob
(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers
With higher upload rate, can find better trading partners & get file faster!
2: Application Layer 19
P2P Case study: Skype
inherently P2P: pairs of users communicate.
proprietary application-layer protocol (inferred via reverse engineering)
hierarchical overlay with super nodes (SNs)
Index maps usernames to IP addresses; distributed over SNs
Skype clients (SC)
Supernode (SN)
Skype login server
2: Application Layer 21
Contents
P2P architecture and benefits P2P content distribution Content distribution network (CDN)
Why Content Networks?
More hops between client and Web server more congestion!
Same data flowing repeatedly over links between clients and Web server
S
C1
C4
C2
C3
- IP router
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 22
Why Content Networks?
Origin server is bottleneck as number of users grows
Flash Crowds (for instance, Sept. 11)
The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server (www.cnn.com) and a content sink (us, as users)
Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer 23
Example: Web Server Farm
Simple solution to the content distribution problem: deploy a large group of servers
Arbitrate client requests to servers using an “intelligent” L4-L7 switch
Pretty widely used today
L4-L7 Switch
Request fromgrad.umd.edu
Request from ren.cis.udel.edu
Request fromren.cis.udel.edu
Request fromgrad.umd.edu
www.cnn.com(Copy 1)
www.cnn.com(Copy 3)
www.cnn.com(Copy 2)
2: Application Layer 24
Example: Caching Proxy
Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet
Reduced network traffic Reduced user perceived latency
Clientren.cis.udel.edu
Clientmerlot.cis.u
del.edu
Intercepters
Proxy
www.cnn.comInternetTCP port 80 traffic
Othertraffic
ISP
2: Application Layer 25
2: Application Layer 26
But on Sept. 11, 2001
Web Serverwww.cnn.com
Usermslab.kaist.ac.kr
1000,000other hosts
1000,000other hosts
New ContentWTC News!
oldcontent request
request
- Caching Proxy
ISP
- Congestion / Bottleneck
2: Application Layer 27
Problems with discussed approaches: Server farms and Caching proxies Server farms do nothing about problems due to
network congestion
Caching proxies serve only their clients, not all users on the Internet
Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies
Accounting issues with caching proxies. For instance, www.cnn.com needs to know the number of
hits to the webpage for advertisements displayed on the webpage
2: Application Layer 28
Again on Sept. 11, 2001 with CDN
Web Serverwww.cnn.com
Usermslab.kaist.ac.kr
New ContentWTC News!
requestnew
content
1000,000other users
1000,000other users
- Surrogate
- Distribution Infrastructure
FL
IL
DE
NY
MA
MICA
WA
2: Application Layer 29
Web replication - CDNs
Overlay network to distribute content from origin servers to users
Avoids large amount of same data repeatedly traversing potentially congested links on the Internet
Reduces Web server load
Reduces user perceived latency
Tries to route around congested networks
2: Application Layer 30
CDN vs. Caching Proxies
Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users
Caches are reactive, CDNs are proactive
Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients
CDNs give control over the content to the content providers, caching proxies do not
CDN Architecture
Surrogate
Surrogate
Request Routing
Infrastructure
Distribution& Accounting Infrastructure
CDN
Origin Server
Client Client
2: Application Layer 31
CDN Components
Distribution Infrastructure: Moving or replicating content from content source
(origin server, content provider) to surrogates
Request Routing Infrastructure: Steering or directing content request from a client to
a suitable surrogate
Content Delivery Infrastructure: Delivering content to clients from surrogates
Accounting Infrastructure: Logging and reporting of distribution and delivery activities
2: Application Layer 32
Server Interaction with CDN
DistributionInfrastructure
1
1. Origin server pushes new content to CDN OR CDN pulls content from origin server
Accounting Infrastructure
2
2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server
CDN
Origin Server
www.cnn.com
2: Application Layer 33
Request Routing
Infrastructure
Client Interaction with CDN
1
1. Hi! I need www.cnn.com/sept11
2
2. Go to surrogate newyork.cnn.akamai.com
3
3. Hi! I need content /sept11
Q:How did the CDN choose the New York surrogate over the California surrogate ?
Client
Surrogate(NY)
Surrogate(CA)
CDNcalifornia.cnn.akamai.com
newyorkcnn.akamai.com
2: Application Layer 34
Request Routing Techniques
Request routing techniques use a set of metrics to direct users to “best” surrogate
Proprietary, but underlying techniques known: DNS based request routing Content modification (URL rewriting) Anycast based (how common is anycast?) URL based request routing Transport layer request routing Combination of multiple mechanisms
2: Application Layer 35
DNS based Request-Routing
Common due to the ubiquity of DNS as a directory service
Specialized DNS server inserted in a DNS resolution process
DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics
2: Application Layer 36
DNS based Request-Routing
Akamai DNS
DN
S q
uery
:w
ww
.cnn.c
om
DN
S r
esp
onse
:A
1
45
.15
5.1
0.1
5
Sess
ion
local DNS server (dns.nyu.edu)128.4.4.12
1) DNS query:www.cnn.com
DNS response:A 145.155.10.15
www.cnn.com
Surrogate145.155.10.15
Surrogate58.15.100.152
AkamaiCDN
test.nyu.edu
128.4.30.15
newyork.cnn.akamai.com
california.cnn.akamai.com
newyork.cnn.akamai.com
Q: How does the Akamai DNS know which surrogate is
closest ?
2: Application Layer 37
DNS based Request-Routing
DN
S q
uery
Akamai DNS
www.cnn.com
Surrogate
Surrogate
AkamaiCDN
test.nyu.edu128.4.30.15
local DNS server (dns.nyu.edu)
128.4.4.12
DNS query
Measure
to
Client D
NS
Measure to Client DNS
Measurement results
Measure
ment resu
lts
Mea
sure
men
tsMeasurem
ents
2: Application Layer 38
DNS based Request-Routingwww.cnn.com
Client DNS76.43.32.4
Surrogate145.155.10.15
Surrogate58.15.100.152
Akamai DNS
AkamaiCDN
Client76.43.35.53
Requesting DNS - 76.43.32.4
Surrogate - 145.155.10.15
www.cnn.comA 145.155.10.15TTL = 10s
Requesting DNS - 76.43.32.4Available Bandwidth = 10 kbpsRTT = 10 ms
Requesting DNS - 76.43.32.4Available Bandwidth = 5 kbpsRTT = 100 ms
2: Application Layer 39
40
DNS based Request Routing: Discussion
Originator Problem: Client may be far removed from client DNS
Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu?Q: Which DNS server performs the last recursion of the DNS
request?
Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates
2: Application Layer
CDN Strategies
Pushing content closer to the users: hop count reduction (overall network traffic reduction)
CDN Strategies: Limelight placing CDN servers near a small # of ISP core nets Akamai placing CDN servers deep into a large # of ISP
networks’ sites Nano Data Center (NaDa) home gateways (STBs/modems) as
CDN servers (peer-to-peer delivery among NaDa servers)
EdgeRouter
CoreRouter
ONTOLT
DSLAM Modem
AccessMetro/Edge NetworkCore Network
NaDaDigital MediaDelivery Platform
2: Application Layer 42
Summary
P2P architecture and its benefits P2P content distribution
BitTorrent, Skype Content distribution network (CDN)
DNS-based request routing