towards understanding developing world traffic sunghwan ihm (princeton) kyoungsoo park (kaist) vivek...

34
TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Upload: leon-gordon

Post on 02-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC

Sunghwan Ihm (Princeton)KyoungSoo Park (KAIST)Vivek S. Pai (Princeton)

Page 2: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

2

IMPROVING NETWORK ACCESS IN THE DEVELOPING WORLD Internet access is a scarce commodity

in the developing world: expensive / slow

Our focus: improving performance of connected network access

Non-focus: providing/extending connectivity (e.g., DTN, WiLDNet)

Sunghwan Ihm, Princeton University

2

Page 3: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

3

POSSIBLE OPTIONS

Web proxy cachingWhole objectsSingle endpoint (local)Designated cacheable traffic only

WAN accelerationPacket-level cachingMostly for enterpriseTwo (or more) endpoints, coordinated

Effective in first worldSunghwan Ihm, Princeton University

3

Page 4: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

4

DEVELOPING WORLD QUESTIONS How effective are these approaches?

Systems designed for first-world useMost traffic studies small, first-world

focusedHow similar is developing region

traffic?

Any new opportunities to exploit?Differences in trafficDifferences in cost/tradeoffsSystem design issues

Sunghwan Ihm, Princeton University

4

Page 5: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

5

UNDERSTANDING DEVELOPING WORLD TRAFFIC

Goal

Shape system design by better understanding the traffic optimization opportunities

Requirements

Large-scale, content-focused analysis

Sunghwan Ihm, Princeton University

5

Page 6: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

6

PRIOR TRAFFIC ANALYSIS WORK Large scale traffic analysis

Internet Study 2007, 2008/2009 by ipoqueOne million usersHigh-level characteristics via DPIFirst-world focus

Developing world traffic analysisDu et al. WWW’06, Johnson et al. NSDR’10Proxy-level analysis from kiosk, Internet

cafes, and community centers

Sunghwan Ihm, Princeton University

6

Page 7: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

7

OUR APPROACH

Combine best featuresLarge-scale and content-focused First world and developing world

Use traffic from CoDeeN content distribution network (CDN)Global proxy (500+ PlanetLab nodes)Running since 200330+ million requests per day

Sunghwan Ihm, Princeton University

7

Page 8: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

8

WHAT TO ANALYZE?

1. Traffic profile

2. Caching opportunities

3. User behavior

Sunghwan Ihm, Princeton University

8

Page 9: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

9

DATA COLLECTION

OriginWeb Server

Local ProxyCache

User BrowserCache

CoDeeNCache

WAN

• Assume local proxy caches• Focus on cache misses only • Capture full content

9

9

Sunghwan Ihm, Princeton University

Page 10: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

10

DATA SET

Duration: 1 week (March 25-31, 2010)

# Requests: 157 Million

Volume: 3 TeraBytes

# Clients (unique IPs): 348 K

# Countries/Regions: 190 /8 networks coverage: 61.3%/16 networks coverage: 24.1%

Sunghwan Ihm, Princeton University

10

Page 11: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

11

TOP COUNTRIES

Requests % Bytes % Clients %

PL

CN

SA

Etc. Etc.Etc.

11

DE (Germany)US (United States)RU (Russian Federation)AE (United Arab Emirates)

PL (Poland)CN (China)SA (Saudi Arabia)

DEUS

US

PL

CN

CN

PL

SA

SA

DEAE

RU

Etc.(185 Countries)

Page 12: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

12

OECD VS. DEVREG

OECD: the first world27 high-income economies from OECD

member countries25% of total traffic

DevReg: the developing worldThe remaining 163 countries and 3 OECD

members: Mexico, Poland, and Turkey75% of total traffic

Sunghwan Ihm, Princeton University

12

Page 13: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

13

ANALYSIS #1: TRAFFIC PROFILE

Conjecture: DevReg users visit low-bandwidth Web pages (small objects and text-heavy)

We often hear a variant of“Offline Wikipedia content suffices for developing world users”

Sunghwan Ihm, Princeton University

13

Page 14: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

14

Small: median 3KB vs. 5KB Large: similar demand/profile

16KB

OBJECT SIZE

Sunghwan Ihm, Princeton University

14

Page 15: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

15

TEXT AND IMAGES

DevReg has a higher fraction of images Exact opposite of bandwidth conjecture

Sunghwan Ihm, Princeton University

15

Page 16: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

16

VIDEO AND AUDIO

DevReg: higher fraction of video & audio Music videos and MP3 songs

Sunghwan Ihm, Princeton University

16

Page 17: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

17

APPLICATION (FLASH)

DevReg has a higher fraction of application traffic

Median near 7%

Sunghwan Ihm, Princeton University

17

Page 18: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

18

ANALYSIS #1 SUMMARY

Some evidence that DevReg-visited sites have smaller objects, but

DevReg users visit large pages as well, and

DevReg users seek a higher fraction of rich content than OECD users

Sunghwan Ihm, Princeton University

18

Page 19: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

19

ANALYSIS #2: CACHING OPPORTUNITY

Conjecture: little gain from larger cachesSome analysis suggests 1GB sufficientTypical cache size < 20GBObject-based caching

Sunghwan Ihm, Princeton University

19

Page 20: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

20

CONTENT-BASED CHUNK CACHING

Split content into chunksName chunks by content (SHA-1 hash)Cache chunks instead of objects

Fetch content, send only modified chunksTwo endpoints neededApplies to “uncacheable” content

A B C D E

Sunghwan Ihm, Princeton University

20

Page 21: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

21

OVERALL REDUNDANCY

40% @ 64 KB: objects or parts of large object 60% @ 1 KB: parts of text pages 65% @ 128 bytes: paragraphs or sentences

Sunghwan Ihm, Princeton University

21

Page 22: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

22

CACHE BEHAVIOR SIMULATION

Simulate one week’s trafficCache misses onlyLRU cache replacement policy

Determine size for near-ideal hit rateCalculate byte hit ratio (BHR) Vary storage size (from 10MB to max)

Results for US, China, and Brazil

Sunghwan Ihm, Princeton University

22

Page 23: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

23

US – 213 GB

Page 24: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

24

CHINA – 559 GB

Page 25: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

25

BRAZIL – 44 GB

Page 26: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

26

ANALYSIS #2 SUMMARY

Chunk caching usefulReduces WAN (cache miss) trafficComplements existing Web proxies

Larger caches usefulUseful reduction in miss rateCheap compared to bandwidth costs

Sunghwan Ihm, Princeton University

26

Page 27: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

27

ANALYSIS #3: USER BEHAVIOR

Conjecture: as first-world Web pages get larger, DevReg users suffer delays

Mechanism: observe aborted transfers Intentional terminationAutomatic when browsing away

Abort = users bored or downloads slow

Sunghwan Ihm, Princeton University

27

Page 28: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

28

CANCELLED OBJECT SIZEC-CDF

Cancelled objects larger than normal (red) Complete objects (green) much larger than actual

download (blue) Most downloads less than 10MB

Sunghwan Ihm, Princeton University

28

Page 29: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

29

CANCELLED TRANSFER VOLUME 17% of transfers are terminated early

Due to the early termination, 25% of actual traffic

If fully downloaded, would have been 80% of all bytesOverall traffic increase of 375%

Sunghwan Ihm, Princeton University

29

Page 30: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

30

CANCELLED CONTENT TYPES

Most canceled responses were text Most bytes from video/audio/application

Sunghwan Ihm, Princeton University

30

Page 31: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

31

% CANCELLED REQUESTS CDF

OECD cancel more often than DevRegMedian almost double

Sunghwan Ihm, Princeton University

31

Page 32: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

32

ANALYSIS #3 SUMMARY

Many transactions aborted

Previewing video filesContent-based caching is effective

OECD users less patient than DevRegCheap bandwidth = more sampling?

Sunghwan Ihm, Princeton University

32

Page 33: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

33

CONCLUSIONS

First glimpse at CoDeeN trafficLarge-scale, content-focused analysisOECD and developing world

Many DevReg assumptions are false In fact, strong desire for rich content, andPatient despite slow connections

Systems implicationsChunk caching worth more explorationLarger caches very useful

Sunghwan Ihm, Princeton University

33

Page 34: TOWARDS UNDERSTANDING DEVELOPING WORLD TRAFFIC Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

[email protected]

http://www.cs.princeton.edu/~sihm/