characterizing cyberlocker traffic flowsnikca89/papers/lcn12b.slides.pdf · depositfiles (%) 8 1 1...
TRANSCRIPT
![Page 1: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/1.jpg)
CHARACTERIZING
CYBERLOCKER
TRAFFIC FLOWS
Aniket
Mahanti
Niklas
Carlsson
Martin
Arlitt
Carey
Williamson
![Page 2: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/2.jpg)
Introduction
• Cyberlocker services provide an easy Web interface to
upload, manage, and share content.
• Recent academic and industry studies suggest that
cyberlocker traffic account for a significant fraction of the
Internet traffic volume.
• Usage, content characteristics, performance, and
infrastructure of selected cyberlockers have been
analyzed in previous work.
• In this work, we analyze flows originating from several
cyberlockers, and study their properties at the transport
layer and their impact on edge network.
2
![Page 3: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/3.jpg)
METHODOLOGY
3
![Page 4: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/4.jpg)
Data Collection
• Flow-level summaries were collected using Bro from a large
university edge router between Jan. 2009 – Dec. 2009
• HTTP transaction summaries used to extract IP addresses of
top-10 cyberlocker services for mapping the flows.
4
![Page 5: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/5.jpg)
Characterization Metrics
• Flow-level characterization
• Flow size: The total number of bi-directional bytes transferred
within a single TCP flow.
• Flow duration: The time between start and end of a flow.
• Flow rate: The average data transfer rate of a TCP connection.
• Flow inter-arrival time: The time between two consecutive flow
arrivals.
• Host-level characterization
• Transfer volume: The total traffic volume transferred by a campus
host during the trace period.
• On-time: The total time the campus host was active during the
trace period.
5
![Page 6: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/6.jpg)
Distribution Characterization and Fitting
6
N
um
ber
of
flo
ws
Metric value
Few big values
Many small values
![Page 7: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/7.jpg)
Distribution Characterization and Fitting
7
M
etr
ic
CCDF to view
CDF to view
N
um
ber
of
flo
ws
Metric value
Few big values (tail)
Many small values (body)
CDF to view
![Page 8: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/8.jpg)
Distribution Characterization and Fitting
8
M
etr
ic
CCDF to view
CDF to view
N
um
ber
of
flo
ws
Metric value
Few big values (tail)
Many small values (body)
CCDF to view
![Page 9: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/9.jpg)
Distribution Characterization and Fitting
9
M
etr
ic
CCDF to view
CDF to view
N
um
ber
of
flo
ws
Metric value
Few big values (tail)
Many small values (body)
CDF to view
CCDF to view
![Page 10: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/10.jpg)
Distribution Fitting and Model Selection
• Complexity of the empirical distribution required us to apply
hybrid fits of candidate distributions, where we fit the empirical
distributions piece-wise.
• Each empirical distribution was divided into pieces based on
manual inspection.
• We fitted seven well-known non-negative candidate statistical
distributions (Lognormal, Pareto, Gamma, Weibull, Levy, and
Log Logistic) to each piece and calculated the nonlinear sum of
least square error.
• The statistical distribution with the lowest error was chosen.
• After fitting all the pieces of the empirical distribution, we
generated the P-P and Q-Q plots; the goodness of the fit was
determined by manually inspecting these plots.
10
![Page 11: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/11.jpg)
Goodness of Fit
11
(a) Fit of body (majority of flows) (b) Fit of tail (rare-extreme values)
![Page 12: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/12.jpg)
DATASET OVERVIEW
12
![Page 13: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/13.jpg)
Trace Summary
Characteristic Count
Flow summary
log size
1 TB
HTTP traffic 4 billion flows
HTTP traffic volume 488 TB
Top-10 cyberlockers 7 million flows (0.19%)
Top-10 cyberlocker
traffic volume
22 TB (4.5%)
Campus hosts
using cyberlockers
13,000 hosts
Service Host Flows Bytes
Mega Network (%) 75 43 68
RapidShare (%) 41 42 13
zSHARE (%) 35 4 8
MediaFire (%) 34 8 3
Hotfile (%) 5 0 2
Enterupload (%) 30 1 2
Sendspace (%) 11 1 1
2Shared (%) 7 0 1
Depositfiles (%) 8 1 1
Uploading (%) 5 0 0
Top-10
cyberlockers 13K 7 mil 22 TB
13
![Page 14: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/14.jpg)
Campus Usage Trends
14
![Page 15: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/15.jpg)
FLOW-LEVEL
CHARACTERIZATION
15
![Page 16: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/16.jpg)
Flow Size
• Content flows only represent 5% of the cyberlocker flows, they consume over 99% of the total traffic volume.
• Content flows are orders of magnitude larger as they transfer large content hosted on the sites.
• Significantly larger flows than typical Web object.
Cyberlocker Model:
Lognormal-Pareto
Cyberlocker
Content Model:
Lognormal
16
![Page 17: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/17.jpg)
Flow Duration
• Content flows are long-lived, partly due to wait times and
bandwidth throttling.
• Most content flows have duration less than 10 minutes
due to medium-sized content downloads.
Cyberlocker Model:
Gamma-Lognormal-
Pareto
Cyberlocker Content
Model:
Lognormal-Gamma
17
![Page 18: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/18.jpg)
Flow Rate
• Cyberlocker content flows are larger and long-lived and
receive higher flow rates.
• There is presence of both free and premium hosts that
download content from the services.
Cyberlocker Model:
Gamma
Cyberlocker
Content Model:
Gamma-Lognormal
18
![Page 19: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/19.jpg)
Flow Inter-arrival
• Parallel downloading increases flow concurrency and decreases flow inter-arrivals.
• Content flow inter-arrivals are longer because there are far fewer such flows; most of the flows are due to objects being retrieved from sites.
Cyberlocker Model:
Lognormal-Gamma
Cyberlocker
Content Model:
Gamma-Lognormal
19
![Page 20: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/20.jpg)
HOST-LEVEL
CHARACTERIZATION
20
![Page 21: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/21.jpg)
Host Transfer Volume
• There is presence of some hosts that transfer a lot of data
as well as hosts that transfer less data.
• Most of the transfer volume is due to content flows.
Cyberlocker Model:
Lognormal-Pareto
21
![Page 22: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/22.jpg)
Heavy Hitters
• The top-100 ranked hosts account for more than 85% of
the cyberlocker and cyberlocker content traffic volume.
• The high skews are well-modeled by non-linear power-law
distributions.
22
![Page 23: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/23.jpg)
Host On-time
• On-times of cyberlocker hosts are heavy-tailed
• Most of the time spent by hosts is for downloading content.
• Users with premium subscription may spend less time since they can download more content in less time.
Cyberlocker Model:
Gamma-Lognormal
23
![Page 24: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/24.jpg)
CONCLUDING REMARKS
24
![Page 25: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/25.jpg)
Conclusions
• Cyberlockers introduced many small and large flows.
• Most cyberlocker content flows are long-lived and
durations follow a heavy-tailed distribution.
• Cyberlocker flows achieved high transfer rates.
• Cyberlocker heavy-hitter transfers followed power-law
distributions.
• Increased cyberlocker usage can have significant impact
on edge networks.
• Long-lived content flows transferring large amounts of
data can strain network resources.
25
![Page 26: Characterizing Cyberlocker Traffic Flowsnikca89/papers/lcn12b.slides.pdf · Depositfiles (%) 8 1 1 Uploading (%) 5 0 0 Top-10 cyberlockers 13K 7 mil 22 TB 13 . ... •The top-100](https://reader034.vdocument.in/reader034/viewer/2022050504/5f96092d552449083e7c747a/html5/thumbnails/26.jpg)
QUESTIONS?
26
Aniket Mahanti – University of Auckland, New Zealand
Niklas Carlsson – Linkoping University, Sweden
Martin Arlitt – HP Labs, USA
Carey Williamson – University of Calgary, Canada