jin li, principal researcher microsoft research [email protected] 1
TRANSCRIPT
Jin Li, Principal ResearcherJin Li, Principal Researcher
Microsoft ResearchMicrosoft Research
[email protected]@microsoft.com
1
Outline
Introduction P2P today Anatomy of BitTorrent, Skype & PPLive Components and tools for P2P applications P2P deployment issues Summary
2
Why P2P applications
Advantage Economy to run: saves centralized bandwidth and/or storage Robustness: no single point of failure Super-scalability: system capacity increases with number of nodes
……6B6B
P2P is ideal to serve the long tail
4
P2P Application History 1st Generation
Napster: 5/99 2nd Generation
Gnutella, early 2000FastTrack, (Kazaa, Grokster, iMesh) 03/2001eDonkey
3rd GenerationBitTorrent, 2002Skype, 08/2003PPLive, 12/2004
5
P2P Aren’t New Existing P2P technology may find its origin in
IP Routers DNS Distributed Computing
What is P2P “Peer-to-peer is a class of applications that take advantage of resources—
storage, cycles, content, human presence—available at the edges of the Internet.” (Clay Shirkey)
Nodes serve both as server & client Every node pays for service by providing access to some of resource
(bandwidth, storage, etc..) No single point of bottleneck or failure Distributed algorithm for
○ Service/content discovery○ Status tracking○ Application layer routing○ Resilience,○ …
6
P2P Traffic Today 1999 to present: fuelled by Napster, KaZaA,
eDonkey and BitTorrent
CacheLogic ResearchInternet Protocol Breakdown 1993 - 2006
8
P2P Networks Large, Growing and active Estimated 200M P2P users worldwide 420 million P2P searches are conducted daily on P2P networks, rivaling
searches on Google, Yahoo and Live. The number of P2P files downloaded in the US was up 24% in 2006. 60% of Internet backbone traffic is P2P and up to 90% of upstream user
traffic is now consumed by P2P applications.
Source: CacheLogic
P2P App 1: BitTorrent Information
Debut: 2002, by Bram CohenFor file-sharing (content location by tracker,
which is a centralized server, rather P2P)Accounts for 35% traffic (according to analysis
by CacheLogic)Numerous clients
○ Official client (Python), Azureus (Java), BitComet (C++)
12
Authorize Use of BitTorrent Many adopters report only by BitTorrent,
could they afford to distribute their files Major open source & free software projectGame update & games (e.g., World of
Warcraft)Films (Warner brother, fan-film)Other materials
13
Step. 2 Tracker – Centralized or DHT
Trackerless BitTorrentEliminate the need of the tracker, more robust Less efficient Lack content distribution control Lack content distribution statistics
15
Centralized trackerTrackerless bittorrent
(DHT)
Step 3. Make/Download Torrent File
16
.torrent file(tracker addr, hashw/integrity check)
web
Torrent search BBSMail
Scheduling Rule 1: Tit-For-Tat BitTorrent rule
A node preferentially uploads to neighbors that provide it with the best download rate (top m)
[jumpstart] Optimistic unchoking: unchokes a random neighbor regardless of download rate every 30s
18
Scheduling Rule 2: Local Rarest Rule
19
tracker
swarms
Client: client download pieces in rarest first order.End game: sends request for all missing blocks, & sends a cancel every time a block arrives.
P2P App 2: Skype Information
Debut: 08/2003, by N. Zennstrom and J. Friis, who founded KaZaA
A P2P overlay network for VoIP and other appFree intra-net VoIP and fee-based
SkypeOut/SkypeIn
20
Skype Usage (Apr. 2008) 11 million concurrent Skype users on line in
peak time (180,000+ simultaneous calls) 309 million registered users worldwide, the
largest registered user base within eBay portfolio (33 million added users for Q1FY08)
$126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007)
100 billion cumulative Skype-to-Skype minutes
21
Skype Gadget
23
Netgear Skype Wi-Fi Phone
Motorola CN620WiFi Cellphone
IPEVO Free-1USB Skype Phone
IPDRUM mobile SkypeCable
50 hardware partners, 150+ Skype certificated device.
Skype vs. VoIP
Public VoIP standardH.323, SIP
Skype is a proprietary VoIP solutionRely on P2P network for user directory
○ Scalable without costly infrastructureRoute calls through supernodes in Skype
○ Universal firewall/NAT traversalEncrypted traffic (but you have to trust
eBay/Skype)
24
Skype Network
any computer w/ sufficient CPU, memory & network bw & not behind firewall
For distributed directory service Relay traffic for computer behind
NAT/firewall
26
Skype Server
Supernode Overlay:
authentication
27
NAT Traversal (Skype) NAT/Firewall detection
Try UDP connectionTry TCP connection (arb port, 80 (http),
443(https) ) Traversal
Direct connection if a) both clients have no NAT, b) one client has no NAT, and one behind cone-NAT
Relay by supernode otherwise Since Skype doesn’t need to pay for relay cost
○ High bitrate wideband voice codec (>24kbps)
Tutorial, Jin Li
28
Skype : Call Routing Through Supernode
Tutorial, Jin Li
Skype Server
Supernode Overlay:
authentication
Route call through supernodesHigh bitrate wideband voice codec (>24kbps)
Skype Encryption
256-bit AES over 128 bit data block1536/2048 RSA for key negotiation (2048/2048
for paid service)
29
Peer 1Peer 2
Skype: Complete Black box(Security by Obfuscation ) Almost everything is obfuscated
Many protections, anti-debugging tricks, ciphered code
Avoid static disassembly: xor binary with a hard-coded key, erasure beginning of the code, own packer
Code integrity check: use checksum to avoid breakpoint
Anti-debugging technique: anti softice, integrity check
Code obfuscationNetwork obfuscation
3030
Online Video Usage On the Rise China - 70% of broadband users watch TV over broadband
18-24 year old BB users: 87% watch music videos online 82% watch TV programs
UK – 18-24 year old BB users: 77% watch music videos online 60% watch TV programs
US 72% stream news videos consistently 59% are watching short clips from movies or TV 48% watch music videos 44% stream sports highlights 43% watch user-generated home videos 23% stream concert clips 22% downloads a full-length movie or TV show 17% stream live sporting events
Video Dominates P2P Traffic
More than 60% of P2P traffic is videoAsia - 50% Objects > 2.5Gb !! 33
Source: CacheLogicwww,cachekiguc,com
CacheLogic Research Breakdown of File-Types on
Major P2P Networks - 2007
Video Streaming as a Major Video Distribution Vehicle Video streams served increased 38.8% in 2006 to 24.92 billion across all entertainment and media sites* (Excluding UGC) Source: Accustream iMedia
Research
Is CDN the Answer?
CDN Capacity Akamai 400Gbps Limelight 1000GB/s
TV quality around 500 kbps 100,000 viewers= 50 Gbps 2.8 million viewers in total from top two CDNs
Current TV audience 2.5 Billion watched the Olympics 1.1 Billion still watches Baywatch EVERY day? Soccer world cup final 3 Billion
Peer Assisted Streaming Peer assisted streaming is the only solution for a
popular site to distributeLarge number of streams (virtual channels)Without IP multicastIn cost effective manner
36 Tutorial, Jin Li
PPLive History
Started in Dec. 2004
Status: Aug. 200775 Million installed base3.5 Million daily active viewer2.2 Million peak concurrent viewer1.48 Million peak concurrent viewer per show (A NBA
play-off game with Huston Rocket, Live, China, Q2,2007)○ 740 Gbps bandwidth bill if not P2P
37 Tutorial, Jin Li
PPLive – Contact Channel Server
38 Tutorial, Jin Li
Channel ListServer
Approx. 300-400 channels, all viewer at the same channel watch the media at the same point
PPLive: Protocol Analysis (Hei’06)
Chunk size: 1s of video Buffer map: coded chunk avail proprietary exchange protocol &
algorithm
41
Subsidizing Behavior (Hei’06)
High bandwidth node subsidizing low bandwidth nodeHigh bandwidth node upload much more than low bandwidth node
42
Campus Node Home Node
Content is King P2P is popular when it facilitates content
distribution that is prohibitively expensive in other meansMany content (e.g., software from small vendors &
movie from a small provider) will not be available without BitTorrent or other P2P application
Quality mattersP2P attracts users as user is able to receive much
higher quality of service compared to the server-client approach
BitTorrent, PPLive offers file download/media streaming with unparallel quality of service
45
Server Role Server is not necessary evil in P2P
Both BitTorrent & Skype has server componentGreatly simplify designReplace all server components with distributed
component may lead to high implementation cost and lower quality of service for the user○ Identify the primary source of P2P saving ○ Identify what key component can be better served by
the server to simplify system design
46
Proper Incentive is Crucial
Incentive is crucialBitTorrent do succeed in discouraging free riding
○ Tit-for-Tat, if you reduce upload rate, your rate of download suffers
○ Additional community feature further helps
Skype: ○ User online (and contribute) during work hours
User doesn’t like but may tolerate subsidizing behavior○ Skype: supernode subsidize other nodes for relay○ PPLive: high bandwidth node subsidize low bandwidth node
48
NAT/Firewall Traversal NAT Traversal is always a pain point in
Developing P2P ApplicationSkype wins mainly because it provides “free
relay” capacity
49
Our Vision Of Microsoft InternetInfrastructure: Inner Layer
Inner layer: massive data center40Gbps egress, 20k+ servers, 50 pairs of
switch/router chassisGood for: computation intensive application, e.g., Live
Search
51
DC
ECN
Our Vision Of Microsoft InternetInfrastructure: Middle Layer
Middle layer: edge computing network (ECN)A few dozen sites strategically placed all over the
worldHundreds plus servers per siteGood for:
○ Form a high speed network core of the Internet○ latency sensitive/throughput hungry application 52
DC
P2P
Our Vision Of Microsoft InternetInfrastructure: Outer Layer
Outer layer: P2P deliveryPeer contributes resource (network bw, CPU,
memory, hard drive)Good for:
○ Throughput intensive app: improve server scalability, use locality to improve throughput to the end peers 53
ECN
DC
P2P Component and Tools Overlay network Scheduling algorithm Erasure resilient coding (ERC) NAT/Firewall traversal
55
58
Why Study Overlay 1st step in building a P2P application Overlay graph affects
Content distribution efficiencyRobustness of P2P application
Tutorial, Jin Li
Overlay Building Methods Tracker based overlay construction
Random overlay w/super peer: BitTorrent
Distributed overlay constructionPure random overlay: GnutellaDHT: Trackerless BitTorrent
59
60
Tracker Based Overlay Construction:Random Overlay (BitTorrent)
Client sends a request to tracker ask for a set of peers The tracker randomly selects peers to include in the
responseThe tracker return numwant peers [default=50, smaller if there
are fewer peers] Upper & lower limit
30 peers is plenty (below, new connections will be formed) 55 peers is too much (client will refuse connections)Parameter is important to performance
○ Too fewer peers, not enough for scheduling algorithm to work with○ Too many peers, high overhead in exchanging HAVE message
Tutorial, Jin Li
Distributed Overlay Building :Random Walk (Gnutella)
61
Each node maintains a neighborhood table (IP addresses)Each node maintains a neighborhood table (IP addresses) Symmetric table Symmetric table With upper and lower bound on # of entriesWith upper and lower bound on # of entries
Joining node uses a random walk from a bootstrap node to Joining node uses a random walk from a bootstrap node to find other nodes in its neighborhood tablefind other nodes in its neighborhood table Neighbor discovery msg with count of to-filled entriesNeighbor discovery msg with count of to-filled entries Upon receiving the neighbor discovery msg, check if the # of Upon receiving the neighbor discovery msg, check if the # of
neighbors reaches the upper boundneighbors reaches the upper bound No, invite the node to join its neighborNo, invite the node to join its neighbor
Forward neighbor discovery msg to a random node in the Forward neighbor discovery msg to a random node in the neighborhood if counter is still greater than zeroneighborhood if counter is still greater than zero
Fail recovery: acknowledge all neighbor discovery msgFail recovery: acknowledge all neighbor discovery msg Detect failureDetect failure
Every Every tt sec, sends keep-alive to every neighbor sec, sends keep-alive to every neighbor No response, probe, still no response, assume failureNo response, probe, still no response, assume failure A cache is maintain to replace failed neighborA cache is maintain to replace failed neighbor Cache empty: send a neighbor discovery msg to a randomly Cache empty: send a neighbor discovery msg to a randomly
chosen neighborchosen neighbor
Distributed Overlay Building : DHT (BitTorrent) Trackerless BitTorrent
All trackerless BitTorrent clients of all shared file form a DHT
Each peer becomes a virtual trackerThe ID (hash) of the file determines which peers
will serve as the tracker
62
63
BitTorrent: Tracker Trackerless Centralized tracker is a single point of failure Multi-trackers
Defined by BitTornadoSpecify an order that the trackers should be accessed
Trackerless BitTorrentUse Kademlia DHTAzureus BitTorrent: 1.3 million members
○ Kademlia with k=20
Official BitTorrent: 200k members○ Kademlia with k=8
Tutorial, Jin Li
Distributed Hash Table
Partition ownership of a set of keys among participating nodesBasic functionality (routing): route the msg to the
unique owner of any given keyDHT:
○ Store(ID, value)○ Retrieve(ID)
Examples○ CAN, Chord, Pastry, Kademlia
64
DHT: The Key is Routing
A P2P CloudEach peer has
a unique IDFor VALUE,
which is the peer with the largest ID that is smaller than VALUE
65
66
DHT: Two Sets of Routing Entries Leaf Set
What are is the node with ID that is immediately before, and is immediately after the current node○ Correctness of DHT routing is guaranteed by the
leaf set
Finger SetA set of fingers that are stick out for fast routing
○ May consider node proximity in finger set construction
DHT schemes differ primarily in leaf set construction
Tutorial, Jin Li
67
Kademlia DHT: History Designed by P. Maymounkov and D.
Mazieres (NYU, 2002) Used by
eDonkey & eMuleBitTorrent Azureus DHTTrackerless BitTorrent (Official client, Torrent,
BitSpirit, BitComet)
Tutorial, Jin Li
68
Kademlia DHT: XOR based Routing Use XOR based distance measure
Node ID: 160-bitEach node is treated as a leaf with position
determined by the shortest unique prefix of its IDSubtrees of node:
○ 1st: half of binary tree not containing the node○ 2nd: half of remaining tree not containing the node○ …
A node know at least one node in each of its subtrees (can know more)
68 Tutorial, Jin Li
Kademlia DHT: Distance Each node
For each subtree (distance 2i to 2i+1), keep a k-bucket○ A list of at most k nodes ○ Sorted by time last seen○ Default: k=20
When encounter new msg from node xNode x already in a k-bucket: move it to the tailNode x not in a k-bucket
○ Associate bucket has less than k node: add x○ Associate bucket has k node: ping least recently seen node
No Response: Evict node
Live node is never evicted
70
Kademlia DHT: Protocol Kademlia protocol
PING (ID)○ Ping a node
FIND_NODE (ID)○ Returns k nodes that is closest to the target ID
STORE( ID, VALUE)○ Store <ID, VALUE> pair to a node
FIND_VALUE (ID)○ Similar to FIND_NODE, except if a value is stored associated with ID,
the stored value is returned
74
Kademlia DHT: Lookup Node x: find k closest nodes to some given node ID
Call FIND_NODE(ID) on x, of k nodes closest to the target, pick closest nodes (default =3): x1, x2, x3
Node x resend FIND_NODE(ID) to xi
○ If xi fails to respond, removed xi from k-bucket and resend query
○ If closer node is found, repeat the step If a round of FIND_NODE(ID) fails to return a node any closer than
the closest already seen, resend FIND_NODE(ID) to all k closest neighbors
Terminate if after FIND_NODE(ID) is sent to all k closest neighbors, no more closer neighbor is found
Kademlia operation relies on lookupOperation store <ID,VALUE> is implemented by sending
STORE(ID,VALUE) to k closest node found in lookupOperation retrieve <ID, VALUE> is implemented by sending
FIND_VALUE(ID) instead of FIND_NODE(ID) during lookup
75
Scheduling Algorithm
Under a certain overlay, how can we efficiently move content in a P2P networkTree based distribution (Push)
○ Content is distributed in a deterministic way in the overlay
Mesh based distribution (Pull/Push)○ Content flows dynamically, with specific delivery
path negotiated by the sender & receiverKey measurement
○ Efficiency○ Robustness
77
Content Delivery Efficiency P2P content delivery efficiency:
Content delivery throughput in P2P / bandwidth in P2P
For example, in P2P file deliveryN: # of nodesL: Length of fileT: Session length (time last node finish)Bi: Upload bandwidth of node I
Bs: Upload bandwidth of source node s
Efficiency: ( ) 1
1
is
s i
is
s
BN LB
T B B N
BLB
T B N
78
Content Delivery Robustness How scheduling algorithm behaves when
Node join/leave network gracefully/abruptlyCertain node/network link is congestedCertain node slows down due to
○ CPU constraint○ Network constraint
etc..
79
Tree Based Delivery: CoopNet
FEC/MDC striped across trees
Up/download bandwidths equalized
8080
a failed node
Tree vs. MeshComparison Tree Mesh
Single Multiple
Efficiency Poor Fair Good
Robustness Poor Fair Good
Balancing Poor Fair Good
Latency Low Low High
Implementation
Easy Fair Tricky
82
Mesh Delivery In-Depth Pull vs. push Peer selection (flow control) Block selection Bandwidth (resource) allocation
83
Mesh Delivery: Pull vs. Push Pull (Receiver-Driven)
Receiver first learns what & where data existsThen request data
Push (Sender-Driven)Sender learns what receiver has (to avoid receipt of
duplicates)
Hybrid approach (pull-push)
84
Pull vs. Push: Comparison
Comparison Pull Push Hybrid
40 nodes, 14 neighbors
74% 99% 99%
40 nodes, 20 neighbors
99% 99% 99%
85
Push & hybrid can achieve more efficient distribution with sparse overlay
When overlay becomes dense, the gap between push/hybrid vs. pull shrinks
Pull can better control QoS for receiver
Mesh Delivery: Peer Selection & Flow Control Peer 1 Peer 2 Peer n
queue 1 queue 2 queue n
req req req
. . . . . .
reply reply reply
Client
86
Ensure network link betweenclient & each peer is fully utilized.
Life of A Request & Reply
Network
TCP sending buffer(client)
TCP sending buffer(server)
Staging queue(client)
req
TCP receiving buffer(server)
Client App
reply
Server App
Hard Drive
87
ReqPending
Bottleneck
Mesh Delivery:Peer Selection & Flow Control Maintain a queue between the receiver and
each senderQueue size: # of data pending from the senderQueue may 1) Identify data from sender, 2) Redirect
loss/delayed request, 3) Flow control Maintain constant request-reply time between
the receiver and all senders This is equivalent to let queue size be
proportional to link bandwidthLink bandwidth
○ Amount of replied data from the link○ Packet size / inter packet arrival time
88
Idle Peer Detection Redirect of request
Peer become idle: no data received for a while (say 1s)
Request to the idle peer is redirected to other active peers (use same peer selection policy)
Follow-up○ If packets come from the idle peer, it is reactivated○ If peer is disconnected due to TCP disconnect event/timeout,
it is removed from the neighbor list
89
Mesh Delivery: Block Selection Block Selection
Sequential: receive the block in sequence [poor performance]
Random: receive a random block [trail close behind]
Rarest: receive the rarest block in the neighborhood, no method for tie breaking (BitTorrent) [trail close behind]
Rarest random: among the rarest block, select a random one [Find to be best, Kostic’05 & Liu’08]
90
Sender Bandwidth Allocation Fair vs. Subsidizing
BitTorrent: Tit-for-Tat○ A node preferentially uploads to neighbors that
provide it with the best download rate (top m)Subsidizing
○ A sender upload blocks in round-robin to receiver○ Subsidizing is desirable for resource intensive P2P
application (e.g., peer assisted streaming)
91
Erasure Resilient Coding
93
k1 2 3
1 2 3 k k+1 n
Original data:
ERC:
k messages
At a certaininstance X X X X
X X
Some of the blocks may be lost in delivery. However, as long as there are at least k blocks delivered, the original data can be reconstructed.
ERC in P2P File Sharing
Split file into k blocks Generate n encoded blocks Perform P2P file sharing (e.g., in a BitTorrent-
like fashion) The peer succeeds in receiving the file if it
receives any k of the n coded blocks
94
ERC Terms
Number of Original Block: k Number of Coded Block: n Rate of ERC: k/n MDS: Maximum Distance Separable
Any k of n coded block may recover the original
The theoretical optimal performance
95
Erasure Encoding: Mathematics
11 12 1 1 1
21 22 2 2 2
1 2
k
k
n n nk k n
g g g x y
g g g x y
g g g x y
,i ix y
96
xkx1 x2
y1 y2 yn
Original data:
Coded data:
: Vectors on Galois Field.
Example: ERC of 10MB
11 12 1 1 1
21 22 2 2 2
1 2
k
k
n n nk k n
g g g x y
g g g x y
g g g x y
97
xkx1 x2
y1 y2 yn
Original data(10MB): Coded data:(n=30)
k=10, GF(28), each vector is 1MB.
30
10 1M 1M
Erasure Decoding: Mathmatics
11 12 1 1 1
21 22 2 2 2
1 2
k
k
n n nk k n
g g g x y
g g g x y
g g g x y
98
xkx1 x2
y1 y2 yn
Original data:
Coded data:
Code select
Available
Erasure Decoding: Mathmatics
11 12 1 1 1
21 22 2 2 2
1 2
' ' ' '
' ' ' '
' ' ' '
k
k
k k kk k k
g g g x y
g g g x y
g g g x y
99
xkx1 x2
y1 y2 yn
Original data:
Coded data:
Original data can be recovered if the sub-generator matrix has a full rank k.
Systematic vs Non-Systematic ERC
Systematic ERCSlightly low encoding & decoding complexity
100
k1 2 3
1 2 3 k k+1 n
Original data:
Non systematicERC:
k messages
1 2 3 k k+1 nSystematicERC:
101
Reed-Solomon Only known MDS code for arbitrary k and n Has been around for decades Has systematic form Cauchy Reed-Solomon Code
Tutorial, Jin Li
1
, : distinctive number
for row & col
iji j
i j
Br c
r c
103
Network Coding in P2P File Delivery
Tutorial, Jin Li
Original data:
Source coding
k messages
n coded messagesn >> k
…Host friend nodes
As long as we get more than k1 messages,we can decode the original data.For MDS code, k1=k, otherwise k1>kClient node
Intermediatenodes
Mix & generate new block
ERC & Network Coding ERC in P2P:
Source send out different ERC blocks to the connected peers
ERC blocks are forwarded, but not mixed during the delivery
Network coding in P2P:Source send out different network coded blocks
to the connected peersThe coded blocks are mixed during the delivery
104
Network Coding(Random Linear Code) Each coded block is a randomly formed Generator vector is attached to each coded
block Block mixing
Start with block c0, c1
Get block c2
105
0 0 1 1 0 1 1
1 0 1 1 0 1 1
, andt
k k
t
k k
c x x x
c x x x
2 0 1 0 1 1 0 1 1 ,
with
t
k k
i i i
c c c x x x
106 Tutorial, Jin Li
How Useful is ERC in P2P Delivery Theory
Broadcast (Edmond, 1972), all nodes are receivers
○ Maxflow(s,T) = minT {mincut(s,ti)}
Routing is enough, block coding/mixing can not further improve theoretical throughput
107 Tutorial, Jin Li
Network Coding vs ERC
Simulation by Gkantsidis (Infocomm 2005)In homogeneous topology
○ Network coding performs slightly better than ERC at source, which performs slightly better than no coding
But with heterogeneous capacity & especially in topologies with cluster○ Network coding performs better than ERC, which
performs better than no coding
108 Tutorial, Jin Li
Network Coding / ERC at Source Implementation by Kostic (Usenix 2005)
In a well connected graph, ERC doesn’t help
Implementation by Wang (IWQoS 2006)Network coding offers inferior performance
○ Due to its need to wait for at least two blocks before it can redistribute
○ Computational complexity hinders the use of network coding in high capacity nodes (e.g., core routers)
110
NAT/Firewall Traversal A very important component in consumer P2P
applicationYou have to build the componentIts performance greatly affects the system performanceNAT/Firewall traversal behavior may also affect system
design decisions
Tutorial, Jin Li
Internet Internet4.18.133.70
192.168.0.2
192.168.0.3
NAT/Firewall Traversal NAT: Network Address Translation,
An Internet standard that enables a local-area network (LAN) to use one set of IP addresses for internal traffic and a second set of addresses for external traffic.
A NAT box located where the LAN meets the Internet makes all necessary IP address translations.
NAT serves three main purposes: Provides a type of firewall by hiding internal IP addresses Enables a company to use more internal IP addresses. Since
they're used internally only, there's no possibility of conflict with IP addresses used by other companies and organizations.
Allows a company to combine multiple ISDN connections into a single Internet connection.
111
Firewall A piece of hardware and/or software which
functions in a networked environment to prevent some communications forbidden by the security policy
Egress filteringOnly allow certain outbound traffic (to certain
IP:port, from a selected set of IP addr)
Ingress filteringOnly allow certain inbound traffic (to certain
IP:port, following know outbound traffic)
112
113
NAT/Firewall TraversalNaïve Approach Under Windows Use IPv6 (Windows)Windows XP SP2 & Vista implements Teredo
tunneling○ Turn on by default in Vista○ Turn off by default in Windows XP (need to turn it
on)Supports STUN traversal (and TCP on top of
UDP)About 60% traversal success rate
Tutorial, Jin Li
Build Your Own NAT/Firewall Traversal NAT/Firewall discovery Peer address advertisement Port prediction & traversal
114
Traversal Procedure 2: Peer Address Advertisement
How do IAdvertise my contact information
to other peers?Know if there are peers who want
to connect to me?
116 Tutorial, Jin Li
Traversal Procedure 3: NAT/Firewall Traversal
117 Tutorial, Jin Li
How to establish direct connectionsbetween the peers that are behindNAT/firewall?
118
NAT/Firewall Detection
Tutorial, Jin Li
With one or more echo servers,determine what type of NAT/Firewallam I behind.
Echo Servers
NAT/Firewall DetectionExt Addr:Port = Int Addr:Port
○ Yes: Public InternetIncoming UDP/TCP allowed no ingress firewall filtering,
can server as a new public peer
○ No: Behind NAT/FirewallNAT/Firewall discovery
UDP/TCP connection to server allowed ○ Yes: no egress firewall filtering○ No: behind firewall with egress filtering
Connect to popular port (say TCP 80, TCP 443)- Yes: may connect by relay- No: connection failed
119
Cone NAT (70%)
120
192.168.0.1:8000
131.107.224.9:3075
4.35.148.9:8100
Port:36721
NAT
131.107.224.9:3074
Symmetric NAT: Sequential (30%)
121
192.168.0.1:8000
131.107.224.9:3075
4.35.148.9:8100
Port:367213672236723
NAT
131.107.224.9:3074
Symmetric NAT: Random (1%)
122
192.168.0.1:8000
131.107.224.9:3075
4.35.148.9:8100NAT
131.107.224.9:3074
Advertise for Access
To identify a peerAddr:Port of the presence serverPublic port of the peer (useful for Cone NAT)Private port of the peer
123123
NAT Traversal
1. Direct Connection
2. STUN
3. Direct Connection or Connect Back
4. Direct Connection or Connect Back to Specific Port
5. Symmetric NAT traversal
6. Symmetric NAT traversal
R. Relay
124
1 1 1 3 3 3 4
1 1 1 3 3 3 R
1 1 1 3 3 3 R
3 3 3 2 5 R R
3 3 3 5 6 R R
3 3 3 R R R R
4 R R R R R R
D - IP UPnPFull Cone
ResCone
SeqSym
SymRand
Fire -wall
D- IP
UPnP
Full Cone
ResCone
SeqSym
SymRand
Fire -wall
Direct Connection (AB)A, B: D-IP, UPnP, Full Cone
125125
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
126
Connect Back (AB)A:D-IP, UPnP, Full Cone, B: not Firewall
126 Tutorial, Jin Li
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
127
Connect Back @ port 80/443(AB)A:D-IP, B: Firewall
127 Tutorial, Jin Li
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
128
STUN (AB) : A,B: Restricted Cone NAT
128 Tutorial, Jin Li
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
Probing
129
Symmetric/Restricted Cone: A: sequential Symmetric NAT, B: Res Cone
129 Tutorial, Jin Li
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
Probing
Get Recent Port Mapping
Send Predicted Port Mapping to Peer
1
2
3
130130 Tutorial, Jin Li
NAT
NAT
PresenceServer
Echo ServerEcho Server
Client A Client B
PresenceServer
Probing
Get Recent Port Mapping
Send Predicted Port Mapping to Peer
Multiple Tries1 3
Symmetric/Restricted Cone: A: sequential Symmetric NAT, B: Res Cone
131
Symmetric/Symmetric: A,B: Both Sequential Symmetric Doable through similar algorithm above If NAT A has k1 uncertain ports after port range
prediction, and NAT B has k2 uncertain ports after port range prediction, the probability of success traversal in a single pass is:1/(k1·k2)
131
TCP NAT Traversal Simple implementation:
Just like UDP NAT Traversal schemeDo bind(), listen() & bind(), connect() on the same
portWhichever socket that successfully establishes
connection completes the traverseRequires OS support of TCP simultaneous open
(Windows XP SP2, each peer may launch a connection attempt separately)
132
TCP Failure Due to NAT Filtering TCP Failure Case
NAT respond RST upon a unseen SYNReason: it causes the socket at the end to close
prematurely
133
TCP NAT Traversal
Variation to counter RSTLow-TTL SYNSpoof/RAW SYNACK
Both don’t work well in deploymentNot supported by OSNot supported by ISP routerSecurity risk
134
P2P Deployment Issues
P2P economy & incentiveP2P economy & incentive Attacks in P2P NetworkAttacks in P2P Network Proximity and heterogeneityProximity and heterogeneity P2P monitoring and debugging aideP2P monitoring and debugging aide
136 Tutorial, Jin Li
P2P Economy & Incentive P2P Social Behavior ObservationP2P Social Behavior Observation
People like to free ride (if given the choice)People like to free ride (if given the choice)○ Shown in study of Gnutella (70% client do not Shown in study of Gnutella (70% client do not
share), Kazaa-Liteshare), Kazaa-LiteHowever, people is OK to contribute if there However, people is OK to contribute if there
is no choice (most people don’t bother to is no choice (most people don’t bother to hack) or contribution improves performancehack) or contribution improves performance○ BitTorrent: sharing improves performanceBitTorrent: sharing improves performance○ Skype & PPLiveSkype & PPLive
One class of nodes subsides another classOne class of nodes subsides another class
138 Tutorial, Jin Li
Existing P2P Incentive
Reciprocal incentiveCertain kind of fair exchange mechanism
directly between two peers
Force sharingSkype & PPLive
Micropayment systemNeeds extensive server infrastructure
139
140
Reciprocol Incentive
Fair exchange mechanism directly between two peers
Can be used in open protocol Can easily deter hacking & attacks Most straight forward in implementation
Tutorial, Jin Li
Iterative Prisoner Dilemma Payoff matrix
141
Cooperate Defect
Cooperate R=d-u,R=d-u S=-u, T=d
Defect T=d, S=-u P=0, P=0
CostCost d (>0): utility of downloading a fragmentd (>0): utility of downloading a fragment u (>0): cost of uploading a fragmentu (>0): cost of uploading a fragment d>ud>u We have: T>R>P>S and 2R>S+T>2PWe have: T>R>P>S and 2R>S+T>2P
Alexrod’s Tournament &TIT-FOR-TAT Alexrod’s tournament in 1981 & 1984
14 entries & 62 entries
Four principles for highly effective strategiesNice, retaliatory, forgiving, clear
TIT-FOR-TATCooperative on first moveCopy the last move of the opponent
142
143
Tit-For-Tat In BitTorrent BitTorrent rule
Preferentially uploads to m neighbors that provide it with the best download rate
Surprisingly simple yet effectiveFree ride leads to relatively poor performance,
thus is deterred (even research shows that in network with large # of seed node, free ride may have small penalty)
However, may not lead to best utilization of peer resources.
Tutorial, Jin Li
144
Force Sharing
Force sharingSkype & PPLive
○ Pro: superior system performance for all users as users with more resource subsidize users with less resource
○ Con: inherently unfair
Relies on proprietary implementationSubject to hacking
○ Skype is hacked, lead to Skype Lite?
Force sharing results in poor system performance○ PPLive slows other apps to a crawl
Tutorial, Jin Li
145
Micropayments
BasicsVirtual money (may or may not be convertible to real currency)Peer pays for resourceUsed in MMORPG, Xbox 360
ProblemNeed server supportSubject to hackMental transaction cost argument:
○ Each price, no matter how small, carries a burden of decisionMinors and those without credit card may be deterred
Tutorial, Jin Li
147
P2P Threat Scenario P2P vs. client-server
In client-server, client only needs to trust serverIn P2P, all peers are servers, trust issue is severe. P2P networks must assume some nodes are malicious
P2P attack scenarioDoS Attack
○ Sybil AttackPollution & Poisoning AttackOther Attack
Tutorial, Jin Li
Denial of Service (DoS) Attack DoS Attack
On P2P application itself○ “Berman's bill would give us [copyright owners] the right to
launch denial-of-service attacks, known as ‘interdiction,’ that would deluge P2P file servers with false file requests to slow the system or bring it to a halt. “ Network World Fusion, 8/5/02
Towards system not necessarily a participant in P2P○ Naoumoy (IWP2P, 2006) demonstrates
DDoS attack launched from overnetBy poisoning the distributed indexBy poisoning the routing table
148
Sybil Attacks Sybil is a well known character of the 70s , a
women possessed with multiple personality disorder, of 16 characters
Sybil AttackA single faulty entity masquerades and presents
multiple identity – thus control substantial portion of the network.
149
Why Use Sybil Attacks in P2P? Tracker is a single weak point in the P2P
systems Sybil attack
Create massive false identity that brings down the tracker
CounterMakes identity more expensiveA trusted central agency certify identitiesAttach each identify with certain real-world
information, to create accountability with each identity
150
Pollution & Index Poisoning Attack
Index poisoning attack Content pollution attack
151
MediaSentry.com OverPeer.com
154
Counter: Pollution Attack
BitTorrentProvide block hash in torrent file (counter
pollution)○ Signature works as well
Automatically contact peer (counter index poisoning)
Tutorial, Jin Li
Pollution Attack in Network Coding Problem:
Peer node may mix contentHash/signature calculated by the source may not be
available for the newly mixed content Solution:
Req new hash/signature from source○ Source has to be online, heavy computation burden
Homomorphic hash○ Computationally expensive
Secure Random Checksums (SRCs)○ Server needs to be online for each client to distribute SRCs
Homomorphic Signature○ Need to use large Galois Field, computationally simple, but
with large overhead
155
Counter: Index Poisoning
Signing techniques to verify content authenticityThis helps piracy issue in P2P as well
Piracy in P2PP2P is closely associated with piracy today
○ The history is with Napster, Kazaa, Grokster, PirateBay, …
Commercial P2P networks must make an effort to prevent illegal publishers ○ E.g. Grokster Case,
http://www.eff.org/IP/P2P/MGM_v_Grokster
156
Other Attacks (1) To grab peer identity
Man-in-the-middle, replay, password guessing attack
Counter: end-to-end encryption
Identity attackTracking and harassing user
Debugging/reverse-engineering: E.g., reverse-engineering of skypeCounter: security by obfuscation
157
Other Attacks (2)
Insertion of viruses Spy ware in the P2P software Spamming (Send Junk)
158 Tutorial, Jin Li
Peer Parameter Estimation Peer Selection Based on
Latency (RTT)Bandwidth/throughput estimation (heterogeneity)
○ Link bandwidth (upload)ISP locality (proximity)Packet LossAvailability (outrage)
This section coverHow to calculate heterogeneity (bandwidth) &
proximity (ISP locality)Build heterogeneity & proximity aware P2P
application is still an ongoing research topic 160
Bandwidth Estimation Problem: what is my available bandwidth?
TCP Throughput ○ Intrusive measurement○ Resource consuming○ Slow to get result
Back-to-back packet pair○ Measuring bottleneck link capacity○ Not for available bandwidth measurement
Various packet train approach, e.g., pathneck
161
Available Bandwidth Measure Definition
Consider an end-to-end path of n links: L1,L2,…,Ln
Their capacities are B1,B2,…,Bn
Their traffic loads are C1,C2,…,Cn
The bottleneck link is ○ Bb=min(B1,B2,…,Bn)
The tight link is○ Bt-Ct=min(B1-C1,B2-C2,…,Bn-Cn)
○ Bt-Ct is the available bandwidth
162
Basic Tool Back-to-back packet will estimate the
bandwidth of the bottleneck link A packet train will estimate the bandwidth of
the tightest link
163
Available Bandwidth Measurement
164
!probing pkt
background pkt
sending gaparriving gap
available bandwidth0
arriving gap
sending gap
turning point
Pathneck (Hu, Sigcomm’04)
Load packets are used to measure available bandwidth
Measurement packets are used to obtain location information
165
Load packets
60 pkts, 500 B
TTL
255255255255
measurement packets
measurement packets
30 pkts, 60 B 30 pkts, 60 B
2 130301 2
Recursive Packet Train (RPT)
Transmission of RPT (Hu, Sigcomm’04)
166
2551 2 3 4 4 3 2 1255 255 255 255
2541 2 3 3 2 1254 254 254 254
2531 2 2 1253 253 253 253
R1
S
R2
R3
0 0
0 0
0 0
g1
g2
g3
2532 2253 253 253 2531 1
2521 1252 252 252 252
gap values are the raw measurement
Choke Point Detection (Hu, Sigcomm’04)
167
choke points
bottleneck point
hop #
gap
0
hop #
a_bw
1 2 3 4 5 6 7 8
ISP Locality P2P reality
P2P represents 60% of Internet traffic and still growing 92% of P2P traffic crosses transit/peering links 80% of upstream capacity is consumed by P2PP2P protocols will aggressively consume all available capacity
P2P & ISPP2P affects QoS levels for ALL subscribersAs content provider uses P2P to deliver content, their costs are
being passed onto service providersWeak linkage between traffic generated/served to end-user and
charging Solution
Try best to deliver content from nearby peersConvert cross transit traffic to intra-ISP traffic
168
Internet Today
169
MSNAS
MSNAS
Comcast AS
Comcast AS
VerizonAS
VerizonAS
BGP router & peering point
Corerouter
Gatewayrouter
Gatewayrouter
End users
End users
Internet Topology Discovery Information used
Peer external IP addressPeer subnet mask
Topology DiscoveryBGP feed: external IP ASGeoLocation Service (e.g., IP2Location): external
IP Country, region, city, Latitude, Longitude, ISP domain
External IP + subnet mask: POPExternal IP: home/corporation
173
Debugging of P2P Application
ProblemsDebugging is difficult, but debugging a P2P
application is especially hardExecution is non-deterministic
○ Two executions may yield different result○ Bug is not reproducible○ Bug occur only in large scale run may be difficult to debug
177
Nondeterminism in P2P Application Nondeterminism
Two executions may lead to different results
RaceTwo or more processes attempts to access the same resource and at
least one of the processes is storing (write) into the resource○ Read-write race○ Write-write race
Deadlock○ Two or more processes are waiting for events such that none of the events
occurs
Solution:Deterministic replayDistributed assert
178
Deterministic Replay Deterministic replay
Log into a trace all incoming network traffic, timing event, thread switching event that affect the execution path of the P2P application
During debugging, have the capability to replay the activity of a set of peers from the trace
Tools availableLiblog & Friday (UC Berkeley)WiDS (MSRA)
179
Distributed Assert
Distributed Assert Collect status information (e.g., state, %
task finished, bandwidth usage, etc..) with timestamp in a distributed fashion
Send the information to a central reporting server
Create snapshot of the whole system
Tools availableD3S: debugging deployed distributed
systems (MSRA)
180
Summary
P2P can add significant value to content ownerReduce cost of deliveryScale up to meet customer demand without significant
infrastructure investment
P2P can delivery better quality of service (QoS) to the end userThe users have access to more resource (bandwidth,
storage) on the network, lead to an improved QoS
But needs to be done right!
181