traffic patterns in peer-to-peer-networking christian...

Post on 07-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Traffic Patterns in Peer-to-Peer-Networking

Christian Schindelhauer

joint work with Amir Alsbih

Thomas Janson

Albert-Ludwig University FreiburgDepartment of Computer Science

Computer Networks and Telematics

to be presented at ITA

Global Internet Traffic Shares1993-2004

Source: CacheLogic 2005

E-Mail

FTP

Peer-to-Peer

Web

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

CacheLogic Research Trends of Internet Protocols 1993-2004

Share

of In

tern

et tr

affi

c

0

10

20

30

40

50

60

70

2

P2P Share Germany 2007

Quelle: Ipoque 2007

3

What Germans Download 2007by Volume

4Quelle: Ipoque 2007

P2P Systems Germany 2007by Volume

5Quelle: Ipoque 2007

Motivation

Peer-to-Peer-Networks- Quality depends on user behavior- High churn rates - Egoistic users

Only a small number of independent studies of Internet traffic

We analyze the complete traffic of 20,000 users in August 2009 of a German digital cable TV based Internet provider.- Traffic was centrally monitored - Type classification by deep packet inspection- Looked at BitTorrent traffic

6

Network Monitoring

Network monitoring systems- installed in recent

years- allow to monitor the

behavior of each user.

Motivation- new governmental

regulations - detection and

prevention of • Internet fraud

• denial-of-service attacks

• spam mailers• phishing attacks, • criminal conspiracies, • forbidden contents • copyright violations.

ISPs are (usually) not the juridical target - are required to uphold

an infrastructure, which allows law enforcement to take action in such cases.

7

Background of the Study

Our study- limited access to anonymized

user data - gathered by a network

monitoring systems using deep packet inspection (DPI)

Main product of „our“ ISP: digital cable TV- thousands of German

households- byproduct they also offer

telephone and Internet service

German households are connected via DSL- rural area the bandwidth is

rather low

- urban areas high data rates

Mobile phone carriers providing GPRS, EDGE and HSDPA gain in traffic.

Digital TV cable is a stable market- Installation of the necessary

infrastructure is expensive- Television is still important

media of Germans- No open market for digital

cable TV. - Cable TV users extend their

contracts to include Internet service because of low prices and high bandwidth rates.

8

Internet over Digital TV Cable

Each user needs a digital cable modem- encodes and decodes the data traffic

Throughput rates range from 32-100 MBit/s download - DSL traffic: 2 to 16 kBit/s- HSDPA ending at 7.2 MBit/s

No network bottleneck - measured traffic behavior directly reflects the users wishes.

Ideal opportunity- What do Internet users want? - How long are users online? - How much data do users download or upload? - What are the network services they use?

9

BitTorrent and „Friends“

BitTorrent- most successful peer-to-peer network protocol- BitTorrent encourages to upload data using incentives

Several BitTorrent clients deviate from the original protocol - BitTyrant

• achieves a download gain up to 70 percent • strategic selection of peers in the swarm

- BitThief • free riding client• allows downloading without any upload • achieve higher download rates than the official client.

10

Bittorrent

Bram Cohen Bittorrent is a real (very successful) peer-to-peer network

- concentrates on download- uses (implicitly) multicast trees for the distribution of the parts of a file

Protocol is peer oriented and not data oriented Goals

- efficient download of a file using the uploads of all participating peers- efficient usage of upload

• usually upload is the bottleneck• e.g. asymmetric protocols like ISDN or DSL

- fairness among peers• seeders against leeches

- usage of several sources

11

Bittorrent: Coordination

Central coordination - by tracker host- for each file the tracker outputs a set of random peers from the set of

participating peers• in addition hash-code of the file contents and other control information

- tracker hosts to not store files• yet, providing a tracker file on a tracker host can have legal consequences

- Is often replaced with a decentralized peer-to-peer network

File- is partitions in smaller pieces

• as describec in tracker file- every participating peer can redistribute downloaded parts as soon as he

received it- Bittorrent aims at the Split-Stream idea

Interaction between the peers- two peers exchange their information about existing parts- according to the policy of Bittorrent outstanding parts are transmitted to the other

peer

12

BittorrentPart Selection

Problem- The Coupon-Collector-Problem is the reason for a uneven distribution of parts

• if a completely random choice is used

Measures- Rarest First

• Every peer tries to download the parts which are rarest- density is deduced from the comunication with other peers (or tracker host)

• in case the source is not available this increases the chances the peers can complete the download

- Random First (exception for new peers)• When peer starts it asks for a random part• Then the demand for seldom peers is reduced

- especially when peers only shortly join

- Endgame Mode• if nearly all parts have been loaded the downloading peers asks more connected

peers for the missing parts• then a slow peer can not stall the last download

13

BittorrentPolicy

Goal- self organizing system- good (uploading, seeding) peers are rewarded- bad (downloading, leeching) peers are penalized

Reward- good download speed- un-choking

Penalty- Choking of the bandwidth

Evaluation- Every peers Peers evaluates his environment from his

past experiences

14

BittorrentChoking

Every peer has a choke list- requests of choked peers are not served for some time- peers can be unchoked after some time

Adding to the choke list- Each peer has a fixed minimum amount of choked peers (e.g. 4)- Peers with the worst upload are added to the choke list

• and replace better peers Optimistic Unchoking

- Arbitrarily a candidate is removed from the list of choking candidates• the prevents maltreating a peer with a bad bandwidth

15

Deep Packet Inspection

Internet Service Provider- deep packet inspection system for analyzing the type of traffic

Using heuristics- Analyze the first few packets to identify a protocol

• Assumption further data exchange over the connection (IP socket) belongs to the same protocol.

- Only protocol headers of the first few packet are inspected- Applications without encryption can be identified this way

Encrypted protocols- can only identified by version numbers and other unencrypted

information- Up to 20 packets have to be inspected- User data cannot be processed

16

Traffic Types

17

Mea

n Ho

st T

raffi

c [k

b/s]

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

HTTP in

HTTP out

BitTorre

nt in

BitTorre

nt ou

t

NNTP in

NNTP out

eDon

key in

eDon

key o

utSSL

in

SSL ou

t

SHOUTcast

in

SHOUTcast

out

RTMP in

RTMP out

FTP trans

fer in

FTP trans

fer o

ut

Micros

oft BITS in

Micros

oft BITS o

ut

Gnutel

la in

Gnutel

la ou

t

Rest i

n

Rest o

ut

3.25

0.12

0.680.52

0.34

0.010.210.24 0.21

0.03 0.140.02 0.11 0 0.060.01 0.05 0 0.04 0

0.390.21

HTTP 44.4 %

BitTorrent 24.1 %

NNTP 14.2 %SHOUTcast 6.4 %

RTMP 5 %

eDonkey 4 %RTSP 1.2 %Skype 0.8 %

HTTP 14.6 %BitTorrent 64.3 %

NNTP 0.7 %SHOUTcast 0.7 %RTMP 0.4 %

eDonkey 16.3 %

RTSP 0.1 %Skype 3 %

Internet Traffic of a Geman ISPAugust 2009

18

Download

Upload

Our Data Source

After 15 minutes the DPI systems- reports the number of incoming and outgoing bytes for each

protocol for each user.- rollected in log files. - We have received the data without IP addresses

• replaced by anonymized IDs integer

For each interval of 15 minutes over a month - we know for each anonymized user the number of open

connections- the incoming and outgoing overall traffic- the incoming and outgoing unencrypted BitTorrent traffic- the sum of HTTP traffic of all users.

We have received the sum of overall traffic in this month for each host for each service type.

19

0

5000

10000

15000

20000

25000

30000

08 01 08 08 08 15 08 22 08 29 0

20

40

60

80

100

120

140

#hos

ts

perc

enta

ge

date

#hosts complete#hosts having any traffic

#hosts using BitTorrent complete#hosts not using BitTorrent

#hosts using BitTorrent

Overvie of all Traffic in August 2009

20

0 2 4 6 8

10 12 14 16

08 01 08 08 08 15 08 22 08 29

traffi

c [k

b/s]

date

complete incomplete outBitTorrent in

BitTorrent out

Overall Traffic Devolopent in August averaged over days 2009

21

0 2 4 6 8

10 12 14 16

08 01 08 08 08 15 08 22 08 29

traffi

c [k

b/s]

date

complete incomplete outBitTorrent in

BitTorrent out

Measured Traffic

22

Shortcomings of Data Set

Identification of each user by the IPv4 address is not completely reliable No reconnection every 24 hours

- unlike other ISPs- IPv4 address of a network user remains the same until the modem is rebooted

Possible reasons for a modem reboot are- hardware reset- disconnecting of the modem - power outage.

Error types- user occurs under several IP addresses

• leads to an overestimation of users. - different user might reuse a free IP address

• ISP assured us that IPv4 addresses are rarely reused

Look at the intervals when an IP address is used and count the number of such simultaneous time intervals.

- This number gives us a lower bound of the number of distinct users

23

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

08 01 08 08 08 15 08 22 08 29 09 05

#Hos

ts m

inim

um o

verla

p

Date

maxmeasured

random

Overlapping TechniqueBitTorrent Traffic only

24

0

5000

10000

15000

20000

25000

08 01 08 08 08 15 08 22 08 29 09 05

#Hos

ts m

inim

um o

verla

p

Date

maxmeasured

random

Overlap with Internet Traffic

25

0

5000

10000

15000

20000

25000

08 01 08 08 08 15 08 22 08 29

#hos

ts

date

#hosts with trafficmin. overlapping complete traffic

#hosts using BitTorrentmin overlapping BitTorrent traffic

Comparison

26

BitTorrent Traffic

Scatterplots for Up/Download Traffic BitTorrent and other traffic not related Remember: correlation coefficient

27

0

5

10

15

20

0 5 10 15 20

mea

n up

load

[kb/

s]

mean download [kb/s]

Scatterplott - BitTorrent Traffic

28

correlation coefficient: 0.58

0

0.1

0.2

0.3

0.4

0.5

0 0.1 0.2 0.3 0.4 0.5

mea

n up

load

[kb/

s]

mean download [kb/s]

Scatterplott - BitTorrent Traffic (Zoom in)

29

0

10

20

30

40

50

0 10 20 30 40 50

dow

nloa

d Bi

tTor

rent

[kb/

s]

download rest [kb/s]

mean download host traffic50% BitTorrent traffic

BitTorrent Traffic versus Other Traffic Downstream

30

correlation coefficient: −0.38.

0

10

20

30

40

50

0 10 20 30 40 50

uplo

ad B

itTor

rent

[kb/

s]

upload rest [kb/s]

mean upload host traffic50% BitTorrent traffic

Upload BitTorrent versus Other Traffic Upload

31

correlation coefficient −0.53

2 82 62 42 22022242628

210

0 2000 4000 6000 8000

>= s

hare

ratio

(up/

dow

n) [l

og]

#user

share ratioshare ratio 1:1

Upload / DownloadDistribution

32

2 82 62 42 22022242628

210

1 4 16 64 256 1024 4096

>= s

hare

ratio

(up/

dow

n) [l

og]

#user [log]

share ratioshare ratio 1:1

Upload / DownloadDistribution (Log-Log-Plot)

33

−300 −200 −100 0 100

020

040

0

Share−Ratio (Upload−Download) to Traffic (Upload+Download)

Upload − Download [Kb/Sec]

Up.

+ D

own.

[Kb/

Sec]

Host TrafficAvg. Host + Imbalance Traffic

Traffic Difference versus Traffic Sum

34

35

0

10

20

30

40

50

0 10 20 30 40 50

dow

nloa

d Bi

tTor

rent

[kb/

s]

download rest [kb/s]

mean download host traffic50% BitTorrent traffic

Figure 8: Scatterplot of(rest, BitTorrent) download pointsfor all hosts.

0

10

20

30

40

50

0 10 20 30 40 50

uplo

ad B

itTor

rent

[kb/

s]

upload rest [kb/s]

mean upload host traffic50% BitTorrent traffic

Figure 9: Scatterplot of(rest, BitTorrent) upload pointsfor all hosts.

1 2 5 10 20 50

1e−0

45e−0

31e−0

1

share−difference [kb/s] [log]

prob

abilit

y [lo

g]

share−dif. up−downshare−dif. Up−Down approx.share−dif. down−upshare−dif. down−up approx.

Figure 10: Probability distribution of share difference (upload− download).

to the same extent.From the scatterplot we have already seen that there is no sharp distribution for

BitTorrent traffic. In fact, if one considers the difference of download and uploadtraffic one observers a piecewise power law (Pareto) distribution 10.

Pd−u [share-difference x] ≈�

0.68 · (x + 1.1)−2.04 for x ≥ 0, (σ = 0.0008)4.33 · (3.07− x)−2.33 for x < 0 (σ = 0.0006)

To our knowledge we are the first to observe a heavy-tail distribution in the Bit-Torrent traffic difference. An explanation may be the power law distribution of theoverall BitTorrent upload and download, see Fig. 11 and the power law distributionof the continuous online period, see Fig. 12.

The online period can be very well described by a piecewise defined power law.The interval bounds are at 16h and 24h, which reflect the users’ decisions whether

9

From scatterplot: no sharp distribution for BitTorrent traffic.

Difference of download and upload traffic is a piecewise power law (Pareto) distribution

Explanation: maybe the power law distribution of the overall BitTorrent upload and download?

Share Difference of Torrent Traffic

1 2 5 10 20 50

1e−0

45e−0

31e−0

1

share−difference [kb/s] [log]

prob

abilit

y [lo

g]

share−dif. up−downshare−dif. Up−Down approx.share−dif. down−upshare−dif. down−up approx.

Share Difference of Torrent Traffic

36

2 42 22022242628

210

20 22 24 26 28 210 212

>= tr

affic

[kb/

s]

#user

download all timedownload when online

upload all timeupload when online

Overall Traffic Distribution

37

Periodicity

Obviously there is a perodicity in the data New idea:

- Look at Fourier Transformation- And normalized by frequency to receive the „energy“

level- and verify with averaged plots

38

0

500

1000

1500

2000

2500

3000

3500

4000

08 01 08 08 08 15 08 22 08 29 09 05

Traf

fic [B

ytes

per

Sec

ond]

Date

Incoming TrafficOutgoing Traffic

Average Traffic per Host

39

0 50

100 150 200 250 300

0 24 48 72 96 120 144 168 192 216 240

ener

gy [M

B/ho

ur]

period [hours]

Incoming TrafficOutgoing Traffic

Fourier Transformation of Traffic Pattern

40

0 0.5

1 1.5

2 2.5

3 3.5

4

0 2 4 6 8 10 12 14 16 18 20 22

traffi

c [k

b/s]

daytime [hours]

incoming trafficoutgoing traffic

Traffic Average per Hour of the Day

41

0 0.5

1 1.5

2 2.5

3 3.5

4

0 1 2 3 4 5 6 7 8 9 10 11

traffi

c [k

b/s]

half daytime [hours]

incoming trafficoutgoing traffic

Is there a 12h Periodicity?

42

0

2

4

6

8

10

Sat Sun Mon Tue Wed Thu Fri Sat

Traf

fic [K

iloby

tes/

Seco

nd]

Time [Hours]

Incoming TrafficOutgoing Traffic

Traffic Averaged per Weekday

43

How Long are Users Online

Online period- continuous amount of time when users are using

HTTP

Online times- sum of periods over a day/week/month

44

2 82 62 42 22022242628

210

0 2000 4000 6000 8000

>= s

hare

ratio

(up/

dow

n) [l

og]

#user

share ratioshare ratio 1:1

Figure 13: Cumulative share ratio distri-

bution (upload/download) in linear-log

scale.

2 8

2 6

2 4

2 2

20

22

24

26

0 2000 4000 6000 8000 10000

>= o

nlin

e tim

e [h

ours

] [lo

g]

#user

24 hours limitaverage daily online time [hours]

Figure 14: Cumulative daily online time

distribution in linear-log scale.

to let the hosts run over night.

P [online period t] ≈

0.18 · t−0.82for t ≥ 16, (σ = 0.013)

2782 · t−4.40for 16 < t ≤ 24, (σ = 0.00006)

11 · t−2.58for t > 24 (σ = 0.000015)

If one considers the ratio of upload and download (instead of the difference)

one observes an super-exponential decline and decrease on the left and right bound-

aries, which shows to a high concentration around the optimum ratio of 1. We could

not find a reasonable approximation of this functions, while we observe a similar

function of the cumulative daily online time, i.e. the number of hours a users is

producing any traffic.

We are also interested in describing the periodicity of the user behavior. While

the 24h frequency is obvious when one takes a look at Figures 2 and 3 one might

assume that also longer frequencies like a one week frequency (168h) or possibly

smaller frequency might influence user behavior. To approach this problem with-

out any personal bias we analyze the incoming and outgoing traffic of all users

using a Fourier analysis, see Fig. 15. For this we compute the absolute value of

the complex Fourier coefficient for each frequency f and divide this value by the

squared frequency. This method is well-known in the analysis of the radio waves

to measure the energy of each frequency. We observe a strong peak at 24h and

a smaller one at 12h. While there is a local maximum at 168h its size is rather

slow. To verify whether the 24h and 12h frequency actually describe some peri-

odic behavior we overlay all traffic in the12h period in Fig. 16 and 24h period in

Fig. 17. Obviously the 12h term does not show much periodicity while the 24h

period roughly resembles a sinus curve. So, the 12 hour period is a harmonic of

11

!!

!!

!!

!!

! !! ! ! ! !!

!!!!!!

!!!!!

!!!!!!!!!!!

!!

!!

!!!

!!

!!!!!

!!

!

!!!!!

!

!!!!

!

!

!!!!!!!!

!!

!

!!

!

!!

!!

!

!

!

!

!

!!!

!

!

!

!

!!!

!

!

!

!

!!!!!

!

!!

!

!

!

!

!

!!

!!

!!

!

!!

!!

!!

!!

!

!

!

!!!!!!!!

!

!!

!

!

!

!

!!!

1 2 5 10 20 50 100

5e−0

51e−0

35e−0

2Probability Distribution Online Period

online period [hours][log]

prob

abilit

y [lo

g]

16 24

probability for online period length [in hours]approximated function

cases of piecewise definition

Distribution of Online Periods

45

2 8

2 6

2 4

2 2

20

22

24

26

0 2000 4000 6000 8000 10000

>= o

nlin

e tim

e [h

ours

] [lo

g]

#user

24 hours limitaverage daily online time [hours]

Online Times per Day (Lin-Log-Plot)

46

2 8

2 6

2 4

2 2

20

22

24

26

1 4 16 64 256 1024 4096

>= o

nlin

e tim

e [h

ours

] [lo

g]

#user [log]

24 hours limitaverage daily online time [hours]

Online Times (log log Plot)

47

48

Conclusions

Analysis of web traffic of 21,766 hosts of an Internet service provider (ISP) in Germany

Emphasis BitTorrent traffic August in 2009 50% used BitTorrent At most 40% of BitTorrent users online at the

same time Many users participate in this peer-to-peer

network only for some short time periods Most Internet traffic is HTTP

top related