jin li, principal researcher microsoft research [email protected] 1

181
Jin Li, Principal Researcher Jin Li, Principal Researcher Microsoft Research Microsoft Research [email protected] [email protected] 1

Upload: alexander-harmon

Post on 27-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Jin Li, Principal ResearcherJin Li, Principal Researcher

Microsoft ResearchMicrosoft Research

[email protected]@microsoft.com

1

Outline

Introduction P2P today Anatomy of BitTorrent, Skype & PPLive Components and tools for P2P applications P2P deployment issues Summary

2

3

Why P2P applications

Advantage Economy to run: saves centralized bandwidth and/or storage Robustness: no single point of failure Super-scalability: system capacity increases with number of nodes

……6B6B

P2P is ideal to serve the long tail

4

P2P Application History 1st Generation

Napster: 5/99 2nd Generation

Gnutella, early 2000FastTrack, (Kazaa, Grokster, iMesh) 03/2001eDonkey

3rd GenerationBitTorrent, 2002Skype, 08/2003PPLive, 12/2004

5

P2P Aren’t New Existing P2P technology may find its origin in

IP Routers DNS Distributed Computing

What is P2P “Peer-to-peer is a class of applications that take advantage of resources—

storage, cycles, content, human presence—available at the edges of the Internet.” (Clay Shirkey)

Nodes serve both as server & client Every node pays for service by providing access to some of resource

(bandwidth, storage, etc..) No single point of bottleneck or failure Distributed algorithm for

○ Service/content discovery○ Status tracking○ Application layer routing○ Resilience,○ …

6

7

P2P Traffic Today 1999 to present: fuelled by Napster, KaZaA,

eDonkey and BitTorrent

CacheLogic ResearchInternet Protocol Breakdown 1993 - 2006

8

P2P Today

Source: Sandvine, www.sandvine.com

P2P Networks Large, Growing and active Estimated 200M P2P users worldwide 420 million P2P searches are conducted daily on P2P networks, rivaling

searches on Google, Yahoo and Live. The number of P2P files downloaded in the US was up 24% in 2006. 60% of Internet backbone traffic is P2P and up to 90% of upstream user

traffic is now consumed by P2P applications.

Source: CacheLogic

11

P2P App 1: BitTorrent Information

Debut: 2002, by Bram CohenFor file-sharing (content location by tracker,

which is a centralized server, rather P2P)Accounts for 35% traffic (according to analysis

by CacheLogic)Numerous clients

○ Official client (Python), Azureus (Java), BitComet (C++)

12

Authorize Use of BitTorrent Many adopters report only by BitTorrent,

could they afford to distribute their files Major open source & free software projectGame update & games (e.g., World of

Warcraft)Films (Warner brother, fan-film)Other materials

13

Step 1. Install BitTorrent Client

14

Step. 2 Tracker – Centralized or DHT

Trackerless BitTorrentEliminate the need of the tracker, more robust Less efficient Lack content distribution control Lack content distribution statistics

15

Centralized trackerTrackerless bittorrent

(DHT)

Step 3. Make/Download Torrent File

16

.torrent file(tracker addr, hashw/integrity check)

web

Torrent search BBSMail

Step 4. Overlay Forming & Sharing

17

tracker

swarms

client

Scheduling Rule 1: Tit-For-Tat BitTorrent rule

A node preferentially uploads to neighbors that provide it with the best download rate (top m)

[jumpstart] Optimistic unchoking: unchokes a random neighbor regardless of download rate every 30s

18

Scheduling Rule 2: Local Rarest Rule

19

tracker

swarms

Client: client download pieces in rarest first order.End game: sends request for all missing blocks, & sends a cancel every time a block arrives.

P2P App 2: Skype Information

Debut: 08/2003, by N. Zennstrom and J. Friis, who founded KaZaA

A P2P overlay network for VoIP and other appFree intra-net VoIP and fee-based

SkypeOut/SkypeIn

20

Skype Usage (Apr. 2008) 11 million concurrent Skype users on line in

peak time (180,000+ simultaneous calls) 309 million registered users worldwide, the

largest registered user base within eBay portfolio (33 million added users for Q1FY08)

$126M revenue in Q1FY08 (61% YOY growth, 5.6 billion SkypeOut minutes in FY2007)

100 billion cumulative Skype-to-Skype minutes

21

Skype Share of International VoIP Traffic

22

Skype Gadget

23

Netgear Skype Wi-Fi Phone

Motorola CN620WiFi Cellphone

IPEVO Free-1USB Skype Phone

IPDRUM mobile SkypeCable

50 hardware partners, 150+ Skype certificated device.

Skype vs. VoIP

Public VoIP standardH.323, SIP

Skype is a proprietary VoIP solutionRely on P2P network for user directory

○ Scalable without costly infrastructureRoute calls through supernodes in Skype

○ Universal firewall/NAT traversalEncrypted traffic (but you have to trust

eBay/Skype)

24

Skype Ingredient (1)

25

User retrieves ID from

a skype server

Skype Network

any computer w/ sufficient CPU, memory & network bw & not behind firewall

For distributed directory service Relay traffic for computer behind

NAT/firewall

26

Skype Server

Supernode Overlay:

authentication

27

NAT Traversal (Skype) NAT/Firewall detection

Try UDP connectionTry TCP connection (arb port, 80 (http),

443(https) ) Traversal

Direct connection if a) both clients have no NAT, b) one client has no NAT, and one behind cone-NAT

Relay by supernode otherwise Since Skype doesn’t need to pay for relay cost

○ High bitrate wideband voice codec (>24kbps)

Tutorial, Jin Li

28

Skype : Call Routing Through Supernode

Tutorial, Jin Li

Skype Server

Supernode Overlay:

authentication

Route call through supernodesHigh bitrate wideband voice codec (>24kbps)

Skype Encryption

256-bit AES over 128 bit data block1536/2048 RSA for key negotiation (2048/2048

for paid service)

29

Peer 1Peer 2

Skype: Complete Black box(Security by Obfuscation ) Almost everything is obfuscated

Many protections, anti-debugging tricks, ciphered code

Avoid static disassembly: xor binary with a hard-coded key, erasure beginning of the code, own packer

Code integrity check: use checksum to avoid breakpoint

Anti-debugging technique: anti softice, integrity check

Code obfuscationNetwork obfuscation

3030

P2P App 3: PPLive

31

Online Video Usage On the Rise China - 70% of broadband users watch TV over broadband

18-24 year old BB users: 87% watch music videos online 82% watch TV programs

UK – 18-24 year old BB users: 77% watch music videos online 60% watch TV programs

US 72% stream news videos consistently 59% are watching short clips from movies or TV 48% watch music videos 44% stream sports highlights 43% watch user-generated home videos 23% stream concert clips 22% downloads a full-length movie or TV show 17% stream live sporting events

Video Dominates P2P Traffic

More than 60% of P2P traffic is videoAsia - 50% Objects > 2.5Gb !! 33

Source: CacheLogicwww,cachekiguc,com

CacheLogic Research Breakdown of File-Types on

Major P2P Networks - 2007

Video Streaming as a Major Video Distribution Vehicle Video streams served increased 38.8% in 2006 to 24.92 billion across all entertainment and media sites* (Excluding UGC) Source: Accustream iMedia

Research

Is CDN the Answer?

CDN Capacity Akamai 400Gbps Limelight 1000GB/s

TV quality around 500 kbps 100,000 viewers= 50 Gbps 2.8 million viewers in total from top two CDNs

Current TV audience 2.5 Billion watched the Olympics 1.1 Billion still watches Baywatch EVERY day? Soccer world cup final 3 Billion

Peer Assisted Streaming Peer assisted streaming is the only solution for a

popular site to distributeLarge number of streams (virtual channels)Without IP multicastIn cost effective manner

36 Tutorial, Jin Li

PPLive History

Started in Dec. 2004

Status: Aug. 200775 Million installed base3.5 Million daily active viewer2.2 Million peak concurrent viewer1.48 Million peak concurrent viewer per show (A NBA

play-off game with Huston Rocket, Live, China, Q2,2007)○ 740 Gbps bandwidth bill if not P2P

37 Tutorial, Jin Li

PPLive – Contact Channel Server

38 Tutorial, Jin Li

Channel ListServer

Approx. 300-400 channels, all viewer at the same channel watch the media at the same point

PPLive – Obtain Peer List

39 Tutorial, Jin Li

Channel ListServer

Tracker

IP lists

PPLive: Video Display

40

PPLive Engine Media Player

PPLive UI

P2P Queue

Internet

Http Request

PPLive: Protocol Analysis (Hei’06)

Chunk size: 1s of video Buffer map: coded chunk avail proprietary exchange protocol &

algorithm

41

Subsidizing Behavior (Hei’06)

High bandwidth node subsidizing low bandwidth nodeHigh bandwidth node upload much more than low bandwidth node

42

Campus Node Home Node

43

Other P2P Streaming App

Tutorial, Jin Li

PPStream.com(WorldCup Soccer)

Roxbeam Mysee.com

44

Content is King P2P is popular when it facilitates content

distribution that is prohibitively expensive in other meansMany content (e.g., software from small vendors &

movie from a small provider) will not be available without BitTorrent or other P2P application

Quality mattersP2P attracts users as user is able to receive much

higher quality of service compared to the server-client approach

BitTorrent, PPLive offers file download/media streaming with unparallel quality of service

45

Server Role Server is not necessary evil in P2P

Both BitTorrent & Skype has server componentGreatly simplify designReplace all server components with distributed

component may lead to high implementation cost and lower quality of service for the user○ Identify the primary source of P2P saving ○ Identify what key component can be better served by

the server to simplify system design

46

Security

47

Proper Incentive is Crucial

Incentive is crucialBitTorrent do succeed in discouraging free riding

○ Tit-for-Tat, if you reduce upload rate, your rate of download suffers

○ Additional community feature further helps

Skype: ○ User online (and contribute) during work hours

User doesn’t like but may tolerate subsidizing behavior○ Skype: supernode subsidize other nodes for relay○ PPLive: high bandwidth node subsidize low bandwidth node

48

NAT/Firewall Traversal NAT Traversal is always a pain point in

Developing P2P ApplicationSkype wins mainly because it provides “free

relay” capacity

49

50

Our Vision Of Microsoft InternetInfrastructure: Inner Layer

Inner layer: massive data center40Gbps egress, 20k+ servers, 50 pairs of

switch/router chassisGood for: computation intensive application, e.g., Live

Search

51

DC

ECN

Our Vision Of Microsoft InternetInfrastructure: Middle Layer

Middle layer: edge computing network (ECN)A few dozen sites strategically placed all over the

worldHundreds plus servers per siteGood for:

○ Form a high speed network core of the Internet○ latency sensitive/throughput hungry application 52

DC

P2P

Our Vision Of Microsoft InternetInfrastructure: Outer Layer

Outer layer: P2P deliveryPeer contributes resource (network bw, CPU,

memory, hard drive)Good for:

○ Throughput intensive app: improve server scalability, use locality to improve throughput to the end peers 53

ECN

DC

54

P2P Component and Tools Overlay network Scheduling algorithm Erasure resilient coding (ERC) NAT/Firewall traversal

55

56

Overlay network

Overlay Network

Overlay network is a computer network which is built on top of another network

57

58

Why Study Overlay 1st step in building a P2P application Overlay graph affects

Content distribution efficiencyRobustness of P2P application

Tutorial, Jin Li

Overlay Building Methods Tracker based overlay construction

Random overlay w/super peer: BitTorrent

Distributed overlay constructionPure random overlay: GnutellaDHT: Trackerless BitTorrent

59

60

Tracker Based Overlay Construction:Random Overlay (BitTorrent)

Client sends a request to tracker ask for a set of peers The tracker randomly selects peers to include in the

responseThe tracker return numwant peers [default=50, smaller if there

are fewer peers] Upper & lower limit

30 peers is plenty (below, new connections will be formed) 55 peers is too much (client will refuse connections)Parameter is important to performance

○ Too fewer peers, not enough for scheduling algorithm to work with○ Too many peers, high overhead in exchanging HAVE message

Tutorial, Jin Li

Distributed Overlay Building :Random Walk (Gnutella)

61

Each node maintains a neighborhood table (IP addresses)Each node maintains a neighborhood table (IP addresses) Symmetric table Symmetric table With upper and lower bound on # of entriesWith upper and lower bound on # of entries

Joining node uses a random walk from a bootstrap node to Joining node uses a random walk from a bootstrap node to find other nodes in its neighborhood tablefind other nodes in its neighborhood table Neighbor discovery msg with count of to-filled entriesNeighbor discovery msg with count of to-filled entries Upon receiving the neighbor discovery msg, check if the # of Upon receiving the neighbor discovery msg, check if the # of

neighbors reaches the upper boundneighbors reaches the upper bound No, invite the node to join its neighborNo, invite the node to join its neighbor

Forward neighbor discovery msg to a random node in the Forward neighbor discovery msg to a random node in the neighborhood if counter is still greater than zeroneighborhood if counter is still greater than zero

Fail recovery: acknowledge all neighbor discovery msgFail recovery: acknowledge all neighbor discovery msg Detect failureDetect failure

Every Every tt sec, sends keep-alive to every neighbor sec, sends keep-alive to every neighbor No response, probe, still no response, assume failureNo response, probe, still no response, assume failure A cache is maintain to replace failed neighborA cache is maintain to replace failed neighbor Cache empty: send a neighbor discovery msg to a randomly Cache empty: send a neighbor discovery msg to a randomly

chosen neighborchosen neighbor

Distributed Overlay Building : DHT (BitTorrent) Trackerless BitTorrent

All trackerless BitTorrent clients of all shared file form a DHT

Each peer becomes a virtual trackerThe ID (hash) of the file determines which peers

will serve as the tracker

62

63

BitTorrent: Tracker Trackerless Centralized tracker is a single point of failure Multi-trackers

Defined by BitTornadoSpecify an order that the trackers should be accessed

Trackerless BitTorrentUse Kademlia DHTAzureus BitTorrent: 1.3 million members

○ Kademlia with k=20

Official BitTorrent: 200k members○ Kademlia with k=8

Tutorial, Jin Li

Distributed Hash Table

Partition ownership of a set of keys among participating nodesBasic functionality (routing): route the msg to the

unique owner of any given keyDHT:

○ Store(ID, value)○ Retrieve(ID)

Examples○ CAN, Chord, Pastry, Kademlia

64

DHT: The Key is Routing

A P2P CloudEach peer has

a unique IDFor VALUE,

which is the peer with the largest ID that is smaller than VALUE

65

66

DHT: Two Sets of Routing Entries Leaf Set

What are is the node with ID that is immediately before, and is immediately after the current node○ Correctness of DHT routing is guaranteed by the

leaf set

Finger SetA set of fingers that are stick out for fast routing

○ May consider node proximity in finger set construction

DHT schemes differ primarily in leaf set construction

Tutorial, Jin Li

67

Kademlia DHT: History Designed by P. Maymounkov and D.

Mazieres (NYU, 2002) Used by

eDonkey & eMuleBitTorrent Azureus DHTTrackerless BitTorrent (Official client, Torrent,

BitSpirit, BitComet)

Tutorial, Jin Li

68

Kademlia DHT: XOR based Routing Use XOR based distance measure

Node ID: 160-bitEach node is treated as a leaf with position

determined by the shortest unique prefix of its IDSubtrees of node:

○ 1st: half of binary tree not containing the node○ 2nd: half of remaining tree not containing the node○ …

A node know at least one node in each of its subtrees (can know more)

68 Tutorial, Jin Li

Kademlia DHT

Node: 0011……1st subtree: 12nd subtree: 013rd subtree: 0004th subtree: 0010

69

Kademlia DHT: Distance Each node

For each subtree (distance 2i to 2i+1), keep a k-bucket○ A list of at most k nodes ○ Sorted by time last seen○ Default: k=20

When encounter new msg from node xNode x already in a k-bucket: move it to the tailNode x not in a k-bucket

○ Associate bucket has less than k node: add x○ Associate bucket has k node: ping least recently seen node

No Response: Evict node

Live node is never evicted

70

Kademlia DHT : Less than k node

71

One k-bucket

Kademlia DHT : k < NUM < 2k

72

One k-bucketAt most k fingersto each k bucket

Kademlia DHT : 2k < NUM < 4k

73

One k-bucket

Kademlia DHT: Protocol Kademlia protocol

PING (ID)○ Ping a node

FIND_NODE (ID)○ Returns k nodes that is closest to the target ID

STORE( ID, VALUE)○ Store <ID, VALUE> pair to a node

FIND_VALUE (ID)○ Similar to FIND_NODE, except if a value is stored associated with ID,

the stored value is returned

74

Kademlia DHT: Lookup Node x: find k closest nodes to some given node ID

Call FIND_NODE(ID) on x, of k nodes closest to the target, pick closest nodes (default =3): x1, x2, x3

Node x resend FIND_NODE(ID) to xi

○ If xi fails to respond, removed xi from k-bucket and resend query

○ If closer node is found, repeat the step If a round of FIND_NODE(ID) fails to return a node any closer than

the closest already seen, resend FIND_NODE(ID) to all k closest neighbors

Terminate if after FIND_NODE(ID) is sent to all k closest neighbors, no more closer neighbor is found

Kademlia operation relies on lookupOperation store <ID,VALUE> is implemented by sending

STORE(ID,VALUE) to k closest node found in lookupOperation retrieve <ID, VALUE> is implemented by sending

FIND_VALUE(ID) instead of FIND_NODE(ID) during lookup

75

76

Scheduling Algorithm

Scheduling Algorithm

Under a certain overlay, how can we efficiently move content in a P2P networkTree based distribution (Push)

○ Content is distributed in a deterministic way in the overlay

Mesh based distribution (Pull/Push)○ Content flows dynamically, with specific delivery

path negotiated by the sender & receiverKey measurement

○ Efficiency○ Robustness

77

Content Delivery Efficiency P2P content delivery efficiency:

Content delivery throughput in P2P / bandwidth in P2P

For example, in P2P file deliveryN: # of nodesL: Length of fileT: Session length (time last node finish)Bi: Upload bandwidth of node I

Bs: Upload bandwidth of source node s

Efficiency: ( ) 1

1

is

s i

is

s

BN LB

T B B N

BLB

T B N

78

Content Delivery Robustness How scheduling algorithm behaves when

Node join/leave network gracefully/abruptlyCertain node/network link is congestedCertain node slows down due to

○ CPU constraint○ Network constraint

etc..

79

Tree Based Delivery: CoopNet

FEC/MDC striped across trees

Up/download bandwidths equalized

8080

a failed node

Mesh Based Delivery: BitTorrent

81

Tree vs. MeshComparison Tree Mesh

Single Multiple

Efficiency Poor Fair Good

Robustness Poor Fair Good

Balancing Poor Fair Good

Latency Low Low High

Implementation

Easy Fair Tricky

82

Mesh Delivery In-Depth Pull vs. push Peer selection (flow control) Block selection Bandwidth (resource) allocation

83

Mesh Delivery: Pull vs. Push Pull (Receiver-Driven)

Receiver first learns what & where data existsThen request data

Push (Sender-Driven)Sender learns what receiver has (to avoid receipt of

duplicates)

Hybrid approach (pull-push)

84

Pull vs. Push: Comparison

Comparison Pull Push Hybrid

40 nodes, 14 neighbors

74% 99% 99%

40 nodes, 20 neighbors

99% 99% 99%

85

Push & hybrid can achieve more efficient distribution with sparse overlay

When overlay becomes dense, the gap between push/hybrid vs. pull shrinks

Pull can better control QoS for receiver

Mesh Delivery: Peer Selection & Flow Control Peer 1 Peer 2 Peer n

queue 1 queue 2 queue n

req req req

. . . . . .

reply reply reply

Client

86

Ensure network link betweenclient & each peer is fully utilized.

Life of A Request & Reply

Network

TCP sending buffer(client)

TCP sending buffer(server)

Staging queue(client)

req

TCP receiving buffer(server)

Client App

reply

Server App

Hard Drive

87

ReqPending

Bottleneck

Mesh Delivery:Peer Selection & Flow Control Maintain a queue between the receiver and

each senderQueue size: # of data pending from the senderQueue may 1) Identify data from sender, 2) Redirect

loss/delayed request, 3) Flow control Maintain constant request-reply time between

the receiver and all senders This is equivalent to let queue size be

proportional to link bandwidthLink bandwidth

○ Amount of replied data from the link○ Packet size / inter packet arrival time

88

Idle Peer Detection Redirect of request

Peer become idle: no data received for a while (say 1s)

Request to the idle peer is redirected to other active peers (use same peer selection policy)

Follow-up○ If packets come from the idle peer, it is reactivated○ If peer is disconnected due to TCP disconnect event/timeout,

it is removed from the neighbor list

89

Mesh Delivery: Block Selection Block Selection

Sequential: receive the block in sequence [poor performance]

Random: receive a random block [trail close behind]

Rarest: receive the rarest block in the neighborhood, no method for tie breaking (BitTorrent) [trail close behind]

Rarest random: among the rarest block, select a random one [Find to be best, Kostic’05 & Liu’08]

90

Sender Bandwidth Allocation Fair vs. Subsidizing

BitTorrent: Tit-for-Tat○ A node preferentially uploads to neighbors that

provide it with the best download rate (top m)Subsidizing

○ A sender upload blocks in round-robin to receiver○ Subsidizing is desirable for resource intensive P2P

application (e.g., peer assisted streaming)

91

92

Erasure Resilient Coding (ERC)

Erasure Resilient Coding

93

k1 2 3

1 2 3 k k+1 n

Original data:

ERC:

k messages

At a certaininstance X X X X

X X

Some of the blocks may be lost in delivery. However, as long as there are at least k blocks delivered, the original data can be reconstructed.

ERC in P2P File Sharing

Split file into k blocks Generate n encoded blocks Perform P2P file sharing (e.g., in a BitTorrent-

like fashion) The peer succeeds in receiving the file if it

receives any k of the n coded blocks

94

ERC Terms

Number of Original Block: k Number of Coded Block: n Rate of ERC: k/n MDS: Maximum Distance Separable

Any k of n coded block may recover the original

The theoretical optimal performance

95

Erasure Encoding: Mathematics

11 12 1 1 1

21 22 2 2 2

1 2

k

k

n n nk k n

g g g x y

g g g x y

g g g x y

,i ix y

96

xkx1 x2

y1 y2 yn

Original data:

Coded data:

: Vectors on Galois Field.

Example: ERC of 10MB

11 12 1 1 1

21 22 2 2 2

1 2

k

k

n n nk k n

g g g x y

g g g x y

g g g x y

97

xkx1 x2

y1 y2 yn

Original data(10MB): Coded data:(n=30)

k=10, GF(28), each vector is 1MB.

30

10 1M 1M

Erasure Decoding: Mathmatics

11 12 1 1 1

21 22 2 2 2

1 2

k

k

n n nk k n

g g g x y

g g g x y

g g g x y

98

xkx1 x2

y1 y2 yn

Original data:

Coded data:

Code select

Available

Erasure Decoding: Mathmatics

11 12 1 1 1

21 22 2 2 2

1 2

' ' ' '

' ' ' '

' ' ' '

k

k

k k kk k k

g g g x y

g g g x y

g g g x y

99

xkx1 x2

y1 y2 yn

Original data:

Coded data:

Original data can be recovered if the sub-generator matrix has a full rank k.

Systematic vs Non-Systematic ERC

Systematic ERCSlightly low encoding & decoding complexity

100

k1 2 3

1 2 3 k k+1 n

Original data:

Non systematicERC:

k messages

1 2 3 k k+1 nSystematicERC:

101

Reed-Solomon Only known MDS code for arbitrary k and n Has been around for decades Has systematic form Cauchy Reed-Solomon Code

Tutorial, Jin Li

1

, : distinctive number

for row & col

iji j

i j

Br c

r c

Reed-Solomon Decoding

102

Receive

Inverse

103

Network Coding in P2P File Delivery

Tutorial, Jin Li

Original data:

Source coding

k messages

n coded messagesn >> k

…Host friend nodes

As long as we get more than k1 messages,we can decode the original data.For MDS code, k1=k, otherwise k1>kClient node

Intermediatenodes

Mix & generate new block

ERC & Network Coding ERC in P2P:

Source send out different ERC blocks to the connected peers

ERC blocks are forwarded, but not mixed during the delivery

Network coding in P2P:Source send out different network coded blocks

to the connected peersThe coded blocks are mixed during the delivery

104

Network Coding(Random Linear Code) Each coded block is a randomly formed Generator vector is attached to each coded

block Block mixing

Start with block c0, c1

Get block c2

105

0 0 1 1 0 1 1

1 0 1 1 0 1 1

, andt

k k

t

k k

c x x x

c x x x

2 0 1 0 1 1 0 1 1 ,

with

t

k k

i i i

c c c x x x

106 Tutorial, Jin Li

How Useful is ERC in P2P Delivery Theory

Broadcast (Edmond, 1972), all nodes are receivers

○ Maxflow(s,T) = minT {mincut(s,ti)}

Routing is enough, block coding/mixing can not further improve theoretical throughput

107 Tutorial, Jin Li

Network Coding vs ERC

Simulation by Gkantsidis (Infocomm 2005)In homogeneous topology

○ Network coding performs slightly better than ERC at source, which performs slightly better than no coding

But with heterogeneous capacity & especially in topologies with cluster○ Network coding performs better than ERC, which

performs better than no coding

108 Tutorial, Jin Li

Network Coding / ERC at Source Implementation by Kostic (Usenix 2005)

In a well connected graph, ERC doesn’t help

Implementation by Wang (IWQoS 2006)Network coding offers inferior performance

○ Due to its need to wait for at least two blocks before it can redistribute

○ Computational complexity hinders the use of network coding in high capacity nodes (e.g., core routers)

109

NAT/Firewall Traversal

110

NAT/Firewall Traversal A very important component in consumer P2P

applicationYou have to build the componentIts performance greatly affects the system performanceNAT/Firewall traversal behavior may also affect system

design decisions

Tutorial, Jin Li

Internet Internet4.18.133.70

192.168.0.2

192.168.0.3

NAT/Firewall Traversal NAT: Network Address Translation,

An Internet standard that enables a local-area network (LAN) to use one set of IP addresses for internal traffic and a second set of addresses for external traffic.

A NAT box located where the LAN meets the Internet makes all necessary IP address translations.

NAT serves three main purposes: Provides a type of firewall by hiding internal IP addresses Enables a company to use more internal IP addresses. Since

they're used internally only, there's no possibility of conflict with IP addresses used by other companies and organizations.

Allows a company to combine multiple ISDN connections into a single Internet connection.

111

Firewall A piece of hardware and/or software which

functions in a networked environment to prevent some communications forbidden by the security policy

Egress filteringOnly allow certain outbound traffic (to certain

IP:port, from a selected set of IP addr)

Ingress filteringOnly allow certain inbound traffic (to certain

IP:port, following know outbound traffic)

112

113

NAT/Firewall TraversalNaïve Approach Under Windows Use IPv6 (Windows)Windows XP SP2 & Vista implements Teredo

tunneling○ Turn on by default in Vista○ Turn off by default in Windows XP (need to turn it

on)Supports STUN traversal (and TCP on top of

UDP)About 60% traversal success rate

Tutorial, Jin Li

Build Your Own NAT/Firewall Traversal NAT/Firewall discovery Peer address advertisement Port prediction & traversal

114

Traversal Procedure 1: NAT/Firewall Discovery

115

What type of NAT/Firewallam I behind?

Traversal Procedure 2: Peer Address Advertisement

How do IAdvertise my contact information

to other peers?Know if there are peers who want

to connect to me?

116 Tutorial, Jin Li

Traversal Procedure 3: NAT/Firewall Traversal

117 Tutorial, Jin Li

How to establish direct connectionsbetween the peers that are behindNAT/firewall?

118

NAT/Firewall Detection

Tutorial, Jin Li

With one or more echo servers,determine what type of NAT/Firewallam I behind.

Echo Servers

NAT/Firewall DetectionExt Addr:Port = Int Addr:Port

○ Yes: Public InternetIncoming UDP/TCP allowed no ingress firewall filtering,

can server as a new public peer

○ No: Behind NAT/FirewallNAT/Firewall discovery

UDP/TCP connection to server allowed ○ Yes: no egress firewall filtering○ No: behind firewall with egress filtering

Connect to popular port (say TCP 80, TCP 443)- Yes: may connect by relay- No: connection failed

119

Cone NAT (70%)

120

192.168.0.1:8000

131.107.224.9:3075

4.35.148.9:8100

Port:36721

NAT

131.107.224.9:3074

Symmetric NAT: Sequential (30%)

121

192.168.0.1:8000

131.107.224.9:3075

4.35.148.9:8100

Port:367213672236723

NAT

131.107.224.9:3074

Symmetric NAT: Random (1%)

122

192.168.0.1:8000

131.107.224.9:3075

4.35.148.9:8100NAT

131.107.224.9:3074

Advertise for Access

To identify a peerAddr:Port of the presence serverPublic port of the peer (useful for Cone NAT)Private port of the peer

123123

NAT Traversal

1. Direct Connection

2. STUN

3. Direct Connection or Connect Back

4. Direct Connection or Connect Back to Specific Port

5. Symmetric NAT traversal

6. Symmetric NAT traversal

R. Relay

124

1 1 1 3 3 3 4

1 1 1 3 3 3 R

1 1 1 3 3 3 R

3 3 3 2 5 R R

3 3 3 5 6 R R

3 3 3 R R R R

4 R R R R R R

D - IP UPnPFull Cone

ResCone

SeqSym

SymRand

Fire -wall

D- IP

UPnP

Full Cone

ResCone

SeqSym

SymRand

Fire -wall

Direct Connection (AB)A, B: D-IP, UPnP, Full Cone

125125

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

126

Connect Back (AB)A:D-IP, UPnP, Full Cone, B: not Firewall

126 Tutorial, Jin Li

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

127

Connect Back @ port 80/443(AB)A:D-IP, B: Firewall

127 Tutorial, Jin Li

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

128

STUN (AB) : A,B: Restricted Cone NAT

128 Tutorial, Jin Li

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

Probing

129

Symmetric/Restricted Cone: A: sequential Symmetric NAT, B: Res Cone

129 Tutorial, Jin Li

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

Probing

Get Recent Port Mapping

Send Predicted Port Mapping to Peer

1

2

3

130130 Tutorial, Jin Li

NAT

NAT

PresenceServer

Echo ServerEcho Server

Client A Client B

PresenceServer

Probing

Get Recent Port Mapping

Send Predicted Port Mapping to Peer

Multiple Tries1 3

Symmetric/Restricted Cone: A: sequential Symmetric NAT, B: Res Cone

131

Symmetric/Symmetric: A,B: Both Sequential Symmetric Doable through similar algorithm above If NAT A has k1 uncertain ports after port range

prediction, and NAT B has k2 uncertain ports after port range prediction, the probability of success traversal in a single pass is:1/(k1·k2)

131

TCP NAT Traversal Simple implementation:

Just like UDP NAT Traversal schemeDo bind(), listen() & bind(), connect() on the same

portWhichever socket that successfully establishes

connection completes the traverseRequires OS support of TCP simultaneous open

(Windows XP SP2, each peer may launch a connection attempt separately)

132

TCP Failure Due to NAT Filtering TCP Failure Case

NAT respond RST upon a unseen SYNReason: it causes the socket at the end to close

prematurely

133

TCP NAT Traversal

Variation to counter RSTLow-TTL SYNSpoof/RAW SYNACK

Both don’t work well in deploymentNot supported by OSNot supported by ISP routerSecurity risk

134

135

P2P Deployment Issues

P2P economy & incentiveP2P economy & incentive Attacks in P2P NetworkAttacks in P2P Network Proximity and heterogeneityProximity and heterogeneity P2P monitoring and debugging aideP2P monitoring and debugging aide

136 Tutorial, Jin Li

137

P2P Economy & Incentive P2P Social Behavior ObservationP2P Social Behavior Observation

People like to free ride (if given the choice)People like to free ride (if given the choice)○ Shown in study of Gnutella (70% client do not Shown in study of Gnutella (70% client do not

share), Kazaa-Liteshare), Kazaa-LiteHowever, people is OK to contribute if there However, people is OK to contribute if there

is no choice (most people don’t bother to is no choice (most people don’t bother to hack) or contribution improves performancehack) or contribution improves performance○ BitTorrent: sharing improves performanceBitTorrent: sharing improves performance○ Skype & PPLiveSkype & PPLive

One class of nodes subsides another classOne class of nodes subsides another class

138 Tutorial, Jin Li

Existing P2P Incentive

Reciprocal incentiveCertain kind of fair exchange mechanism

directly between two peers

Force sharingSkype & PPLive

Micropayment systemNeeds extensive server infrastructure

139

140

Reciprocol Incentive

Fair exchange mechanism directly between two peers

Can be used in open protocol Can easily deter hacking & attacks Most straight forward in implementation

Tutorial, Jin Li

Iterative Prisoner Dilemma Payoff matrix

141

Cooperate Defect

Cooperate R=d-u,R=d-u S=-u, T=d

Defect T=d, S=-u P=0, P=0

CostCost d (>0): utility of downloading a fragmentd (>0): utility of downloading a fragment u (>0): cost of uploading a fragmentu (>0): cost of uploading a fragment d>ud>u We have: T>R>P>S and 2R>S+T>2PWe have: T>R>P>S and 2R>S+T>2P

Alexrod’s Tournament &TIT-FOR-TAT Alexrod’s tournament in 1981 & 1984

14 entries & 62 entries

Four principles for highly effective strategiesNice, retaliatory, forgiving, clear

TIT-FOR-TATCooperative on first moveCopy the last move of the opponent

142

143

Tit-For-Tat In BitTorrent BitTorrent rule

Preferentially uploads to m neighbors that provide it with the best download rate

Surprisingly simple yet effectiveFree ride leads to relatively poor performance,

thus is deterred (even research shows that in network with large # of seed node, free ride may have small penalty)

However, may not lead to best utilization of peer resources.

Tutorial, Jin Li

144

Force Sharing

Force sharingSkype & PPLive

○ Pro: superior system performance for all users as users with more resource subsidize users with less resource

○ Con: inherently unfair

Relies on proprietary implementationSubject to hacking

○ Skype is hacked, lead to Skype Lite?

Force sharing results in poor system performance○ PPLive slows other apps to a crawl

Tutorial, Jin Li

145

Micropayments

BasicsVirtual money (may or may not be convertible to real currency)Peer pays for resourceUsed in MMORPG, Xbox 360

ProblemNeed server supportSubject to hackMental transaction cost argument:

○ Each price, no matter how small, carries a burden of decisionMinors and those without credit card may be deterred

Tutorial, Jin Li

146

Attacks in P2P Network

147

P2P Threat Scenario P2P vs. client-server

In client-server, client only needs to trust serverIn P2P, all peers are servers, trust issue is severe. P2P networks must assume some nodes are malicious

P2P attack scenarioDoS Attack

○ Sybil AttackPollution & Poisoning AttackOther Attack

Tutorial, Jin Li

Denial of Service (DoS) Attack DoS Attack

On P2P application itself○ “Berman's bill would give us [copyright owners] the right to

launch denial-of-service attacks, known as ‘interdiction,’ that would deluge P2P file servers with false file requests to slow the system or bring it to a halt. “ Network World Fusion, 8/5/02

Towards system not necessarily a participant in P2P○ Naoumoy (IWP2P, 2006) demonstrates

DDoS attack launched from overnetBy poisoning the distributed indexBy poisoning the routing table

148

Sybil Attacks Sybil is a well known character of the 70s , a

women possessed with multiple personality disorder, of 16 characters

Sybil AttackA single faulty entity masquerades and presents

multiple identity – thus control substantial portion of the network.

149

Why Use Sybil Attacks in P2P? Tracker is a single weak point in the P2P

systems Sybil attack

Create massive false identity that brings down the tracker

CounterMakes identity more expensiveA trusted central agency certify identitiesAttach each identify with certain real-world

information, to create accountability with each identity

150

Pollution & Index Poisoning Attack

Index poisoning attack Content pollution attack

151

MediaSentry.com OverPeer.com

Index Poisoning & Pollution Level( Liang, Infocomm 2006)

152

FastTrack Network

Index Poisoning & Pollution Level( Liang, Infocomm 2006)

153Overnet

154

Counter: Pollution Attack

BitTorrentProvide block hash in torrent file (counter

pollution)○ Signature works as well

Automatically contact peer (counter index poisoning)

Tutorial, Jin Li

Pollution Attack in Network Coding Problem:

Peer node may mix contentHash/signature calculated by the source may not be

available for the newly mixed content Solution:

Req new hash/signature from source○ Source has to be online, heavy computation burden

Homomorphic hash○ Computationally expensive

Secure Random Checksums (SRCs)○ Server needs to be online for each client to distribute SRCs

Homomorphic Signature○ Need to use large Galois Field, computationally simple, but

with large overhead

155

Counter: Index Poisoning

Signing techniques to verify content authenticityThis helps piracy issue in P2P as well

Piracy in P2PP2P is closely associated with piracy today

○ The history is with Napster, Kazaa, Grokster, PirateBay, …

Commercial P2P networks must make an effort to prevent illegal publishers ○ E.g. Grokster Case,

http://www.eff.org/IP/P2P/MGM_v_Grokster

156

Other Attacks (1) To grab peer identity

Man-in-the-middle, replay, password guessing attack

Counter: end-to-end encryption

Identity attackTracking and harassing user

Debugging/reverse-engineering: E.g., reverse-engineering of skypeCounter: security by obfuscation

157

Other Attacks (2)

Insertion of viruses Spy ware in the P2P software Spamming (Send Junk)

158 Tutorial, Jin Li

159

Proximity and Heterogeneity

159

Peer Parameter Estimation Peer Selection Based on

Latency (RTT)Bandwidth/throughput estimation (heterogeneity)

○ Link bandwidth (upload)ISP locality (proximity)Packet LossAvailability (outrage)

This section coverHow to calculate heterogeneity (bandwidth) &

proximity (ISP locality)Build heterogeneity & proximity aware P2P

application is still an ongoing research topic 160

Bandwidth Estimation Problem: what is my available bandwidth?

TCP Throughput ○ Intrusive measurement○ Resource consuming○ Slow to get result

Back-to-back packet pair○ Measuring bottleneck link capacity○ Not for available bandwidth measurement

Various packet train approach, e.g., pathneck

161

Available Bandwidth Measure Definition

Consider an end-to-end path of n links: L1,L2,…,Ln

Their capacities are B1,B2,…,Bn

Their traffic loads are C1,C2,…,Cn

The bottleneck link is ○ Bb=min(B1,B2,…,Bn)

The tight link is○ Bt-Ct=min(B1-C1,B2-C2,…,Bn-Cn)

○ Bt-Ct is the available bandwidth

162

Basic Tool Back-to-back packet will estimate the

bandwidth of the bottleneck link A packet train will estimate the bandwidth of

the tightest link

163

Available Bandwidth Measurement

164

!probing pkt

background pkt

sending gaparriving gap

available bandwidth0

arriving gap

sending gap

turning point

Pathneck (Hu, Sigcomm’04)

Load packets are used to measure available bandwidth

Measurement packets are used to obtain location information

165

Load packets

60 pkts, 500 B

TTL

255255255255

measurement packets

measurement packets

30 pkts, 60 B 30 pkts, 60 B

2 130301 2

Recursive Packet Train (RPT)

Transmission of RPT (Hu, Sigcomm’04)

166

2551 2 3 4 4 3 2 1255 255 255 255

2541 2 3 3 2 1254 254 254 254

2531 2 2 1253 253 253 253

R1

S

R2

R3

0 0

0 0

0 0

g1

g2

g3

2532 2253 253 253 2531 1

2521 1252 252 252 252

gap values are the raw measurement

Choke Point Detection (Hu, Sigcomm’04)

167

choke points

bottleneck point

hop #

gap

0

hop #

a_bw

1 2 3 4 5 6 7 8

ISP Locality P2P reality

P2P represents 60% of Internet traffic and still growing 92% of P2P traffic crosses transit/peering links 80% of upstream capacity is consumed by P2PP2P protocols will aggressively consume all available capacity

P2P & ISPP2P affects QoS levels for ALL subscribersAs content provider uses P2P to deliver content, their costs are

being passed onto service providersWeak linkage between traffic generated/served to end-user and

charging Solution

Try best to deliver content from nearby peersConvert cross transit traffic to intra-ISP traffic

168

Internet Today

169

MSNAS

MSNAS

Comcast AS

Comcast AS

VerizonAS

VerizonAS

BGP router & peering point

Corerouter

Gatewayrouter

Gatewayrouter

End users

End users

Inside ISP

170

ISP POP (Point of Presence)

171

Home Networking

172

Internet Topology Discovery Information used

Peer external IP addressPeer subnet mask

Topology DiscoveryBGP feed: external IP ASGeoLocation Service (e.g., IP2Location): external

IP Country, region, city, Latitude, Longitude, ISP domain

External IP + subnet mask: POPExternal IP: home/corporation

173

Hybrid CDN-P2P – Internet video

174

Akamai Network Limelight Network

Hybrid CDN-P2P – software distribution

175

176

P2P Monitoring Tools and Debugging Aide

Debugging of P2P Application

ProblemsDebugging is difficult, but debugging a P2P

application is especially hardExecution is non-deterministic

○ Two executions may yield different result○ Bug is not reproducible○ Bug occur only in large scale run may be difficult to debug

177

Nondeterminism in P2P Application Nondeterminism

Two executions may lead to different results

RaceTwo or more processes attempts to access the same resource and at

least one of the processes is storing (write) into the resource○ Read-write race○ Write-write race

Deadlock○ Two or more processes are waiting for events such that none of the events

occurs

Solution:Deterministic replayDistributed assert

178

Deterministic Replay Deterministic replay

Log into a trace all incoming network traffic, timing event, thread switching event that affect the execution path of the P2P application

During debugging, have the capability to replay the activity of a set of peers from the trace

Tools availableLiblog & Friday (UC Berkeley)WiDS (MSRA)

179

Distributed Assert

Distributed Assert Collect status information (e.g., state, %

task finished, bandwidth usage, etc..) with timestamp in a distributed fashion

Send the information to a central reporting server

Create snapshot of the whole system

Tools availableD3S: debugging deployed distributed

systems (MSRA)

180

Summary

P2P can add significant value to content ownerReduce cost of deliveryScale up to meet customer demand without significant

infrastructure investment

P2P can delivery better quality of service (QoS) to the end userThe users have access to more resource (bandwidth,

storage) on the network, lead to an improved QoS

But needs to be done right!

181