youlighter: an unsupervised methodology to unveil youtube ...youlighter: an unsupervised methodology...

29
YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo Giordano, Stefano Traverso, Luigi Grimaudo, Marco Mellia, Elena Baralis Politecnico di Torino

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

YouLighter: An Unsupervised Methodology

to Unveil YouTube CDN Changes

Alok Tongankar, Sabyasachi Sasha

Narus Inc.

Danilo Giordano, Stefano Traverso,

Luigi Grimaudo, Marco Mellia, Elena Baralis

Politecnico di Torino

Page 2: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Outline

• Motivation

• YouLighter methodology

• Parameter settings

• Results

• Conclusion

2

Page 3: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Outline

• Motivation

• YouLighter methodology

• Parameter settings

• Results

• Conclusion

3

Page 4: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

A massive, distributed infrastructure…• Generates 20+% of world wide traffic• Several thousands of caches (single servers)• Grouped into Hundreds of edge-nodes (groups of

caches)

…However…• Almost no information about YouTube CDN are available• Internet Service Provider (ISP) have no control over YouTube CDN

…which… • …changes frequently and suddenly…• …sometimes causing QoE degradations…• …possibly causing users’ complaining

4

Why monitoring YouTube CDN?

Page 5: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

A massive, distributed infrastructure…• Generates 20+% of world wide traffic• Several thousands of caches (single servers)• Grouped into Hundreds of edge-nodes (groups of

caches)

…However…• Almost no information about YouTube CDN are available• Internet Service Provider (ISP) have no control over YouTube CDN

…which… • …changes frequently and suddenly…• …sometimes causing QoE degradations…• …possibly causing users’ complaining

5

Why monitoring YouTube CDN?

Page 6: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Scenario

• Passive measurements only

• Probe: Tstat• 100+ per-flow statistics

• RTT, TTL, number of packets sent and received, number of byte sent and received, hostname, etc..

6

Internet

PoP1 PoP2

Edge-node 2 Edge-node 3Edge-node 1

TstatTstat

Page 7: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Experiments and Validation

• Three Tstat probes• Two probes in ISP1

• One probe in ISP2

• Consider traffic snapshots• sliding window which moves forward each day (ΔT=1d)

7

Trace Period Volume # UniqueVideos

Caches

ISP1-A 01/04/2013 - 28/02/2014 138.7 TB 2,892,452 8,664

ISP1-B 01/04/2013 - 28/02/2014 152.9 TB 2,848,625 8,899

ISP1-C 01/04/2013 - 28/02/2014 134.8 TB 2,711,179 9,028

ISP2 01/03/2014 - 17/07/2014 48.3 TB 305,802 3,755

Page 8: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Problem 1: unveil the infrastructure

8

1. monitoring the single cache is not effective• Load distribution changes very frequently

• The rank of most used caches changes deeply everyday!

Idea: monitor edge-nodes, not caches

Page 9: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Problem 2: Edge-node change over time

• Get RTT data

• Generate RTT Distribution for each cache

• Evaluate behavior of 5 percentiles • Points of Distribution

Page 10: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Problem 2: Edge-node change over time

• Get RTT data

• Generate RTT Distribution for each cache

• Evaluate behavior of 5 percentiles • Points of Distribution

10

X(1)

Page 11: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Problem 2: Edge-node change over time

11X(2)

10Edge-Nodes

X(1)How can we measure edge-node change over time?

Page 12: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Our Solution: YouLighter

1. Find edge-nodes: Design an unsupervised methodology to group single caches

2. Observe changes: Define a distance to highlight changes in edge-nodes definition and usage over time

12

Internet

PoP1 PoP2

Edge-node 2 Edge-node 3Edge-node 1

TstatTstatYouLighter

Page 13: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Can we exploit an “oracle” to validate the results?Yes! YouTube use IATA codes to name edge-nodes and caches

rx--ABCxxtxx.c.youtube.com

Where ABC corresponds to the cache’s closest airport e.g., r7--fra07t16.c.youtube.com Frankfurt

But, turns out to be not reliable ping r7--fra07t16.c.youtube.com ~= 50ms, 60TTL

ping r7--fra09t21.c.youtube.com ~= 110ms, 55TTL

Some caches sharing same IATA behave differently!

[more complicated than this… see the paper]

13

Validation

Page 14: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Outline

• Motivation

• YouLighter methodology

• Results

• Conclusion

14

Page 15: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Solution to Problem 1: Group caches 1. Get YouTube traffic logs from Tstat probes

2. Measurement consolidation and filtering

3. Feature selection• extract RTT, TTL• compute stats (percentiles, mean, variance, etc…)• Standardize the features (stats) to a unitary space

4. Multi-dimensional clustering using DBSCAN• Input:

• list of (cache_name, <feature1, feature2, … , featureN>) tuples

• Output:• A set of clusters (possibly edge-nodes) + Cluster of outlier

• Parameters:• Min Points (MP): min number of points to create a cluster• Epsilon (ε): maximum distance between any given point in a cluster and its

MPth closest neighbor in the same cluster (critical)15

Page 16: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

16

DBSCAN Performance and Parameter Sensitivity

Best result when all indices are

equal to 1

• Tuning of DBSCAN• Which features? Different statistics

• Percentiles (20th, 35th, 50th, 65th, 80th) , mean+variance, etc..

• Which ε choice?

• Performance indices for clustering evaluation:• True Positive Rate: ratio between # of correctly labeled caches and total # of

caches

• Fragmentation: ratio between # of clusters and the # of observed labels

• Pureness: ratio between # of assigned labels and total # of labels in the ground truth

Page 17: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

17

DBSCAN Performance and Parameter Sensitivity (cont’d)

• 1st week of November 2013 in ISP1• stable situation controlled by manual verification

• Percentiles vs Mean+Standard Deviation using different ε• Percentiles are more robust• Fix ε = 0.04

Page 18: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Now we can find Edge-nodes….

…And now…

how can we analyze Edge-Node evolution over time?

18

Page 19: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Solution to Problem 2: Constellation distance• Surprisingly, no methodology in literature1. Summarize each cluster in a single point called star Ĉ(n)

2. Astral Distance• For each star in Ĉ(n) compute all distances to stars in Ĉ(n+1) and take min• Repeat in the opposite direction

3. Constellation Distance• Sum of all Astral Distances

19Ĉ(2)

AD y4,C(1)( )

AD x1,C(2)( )

X(1) X(2)Ĉ(1)

Page 20: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Outline

• Motivation

• YouLighter methodology

• Results

• Conclusion

20

Page 21: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Change observed from ISP1-A

• Constellation Distance is effective at detecting changes

• Big changes during May – July 2013 • Highlighted also in ISP1-B dataset

21

Page 22: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

22

What happened in May 2013Before change

During change

Page 23: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

23

What happened in May 2013During change

After change

Page 24: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

24

What happened in May 2013Before change

During changeAfter change

Page 25: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Impact on QoE

• Frank splits in• Bad Frank: caches with large std(RTT)

• Good Frank : caches with small std(RTT)

• Download throughput distribution of Frank vs Milan

25

Page 26: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Changes observed from ISP2

• Changes detected in March 2014

• No QoE impairment

26

Page 27: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Outline

• Motivation

• YouLighter methodology

• Results

• Conclusion

27

Page 28: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Conclusion

• YouLighter is effective at detecting changes in YouTube’s CDN infrastructure

• Effective clustering to unveil YouTube infrastructure

• Definition of a novel Constellation Distance• to identify changes between different clustering

• Help ISPs, network admins, etc. to react to changes• To limit QoE impairments• To improve traffic engineering

• Patent requested in 2014• Accepted in Italy• Pending in USA

28

Page 29: YouLighter: An Unsupervised Methodology to Unveil YouTube ...YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes Alok Tongankar, Sabyasachi Sasha Narus Inc. Danilo

Thanks for you Attention

29