youlighter: an unsupervised methodology to unveil youtube ...youlighter: an unsupervised methodology...
TRANSCRIPT
YouLighter: An Unsupervised Methodology
to Unveil YouTube CDN Changes
Alok Tongankar, Sabyasachi Sasha
Narus Inc.
Danilo Giordano, Stefano Traverso,
Luigi Grimaudo, Marco Mellia, Elena Baralis
Politecnico di Torino
Outline
• Motivation
• YouLighter methodology
• Parameter settings
• Results
• Conclusion
2
Outline
• Motivation
• YouLighter methodology
• Parameter settings
• Results
• Conclusion
3
A massive, distributed infrastructure…• Generates 20+% of world wide traffic• Several thousands of caches (single servers)• Grouped into Hundreds of edge-nodes (groups of
caches)
…However…• Almost no information about YouTube CDN are available• Internet Service Provider (ISP) have no control over YouTube CDN
…which… • …changes frequently and suddenly…• …sometimes causing QoE degradations…• …possibly causing users’ complaining
4
Why monitoring YouTube CDN?
A massive, distributed infrastructure…• Generates 20+% of world wide traffic• Several thousands of caches (single servers)• Grouped into Hundreds of edge-nodes (groups of
caches)
…However…• Almost no information about YouTube CDN are available• Internet Service Provider (ISP) have no control over YouTube CDN
…which… • …changes frequently and suddenly…• …sometimes causing QoE degradations…• …possibly causing users’ complaining
5
Why monitoring YouTube CDN?
Scenario
• Passive measurements only
• Probe: Tstat• 100+ per-flow statistics
• RTT, TTL, number of packets sent and received, number of byte sent and received, hostname, etc..
6
Internet
PoP1 PoP2
Edge-node 2 Edge-node 3Edge-node 1
TstatTstat
Experiments and Validation
• Three Tstat probes• Two probes in ISP1
• One probe in ISP2
• Consider traffic snapshots• sliding window which moves forward each day (ΔT=1d)
7
Trace Period Volume # UniqueVideos
Caches
ISP1-A 01/04/2013 - 28/02/2014 138.7 TB 2,892,452 8,664
ISP1-B 01/04/2013 - 28/02/2014 152.9 TB 2,848,625 8,899
ISP1-C 01/04/2013 - 28/02/2014 134.8 TB 2,711,179 9,028
ISP2 01/03/2014 - 17/07/2014 48.3 TB 305,802 3,755
Problem 1: unveil the infrastructure
8
1. monitoring the single cache is not effective• Load distribution changes very frequently
• The rank of most used caches changes deeply everyday!
Idea: monitor edge-nodes, not caches
Problem 2: Edge-node change over time
• Get RTT data
• Generate RTT Distribution for each cache
• Evaluate behavior of 5 percentiles • Points of Distribution
Problem 2: Edge-node change over time
• Get RTT data
• Generate RTT Distribution for each cache
• Evaluate behavior of 5 percentiles • Points of Distribution
10
X(1)
Problem 2: Edge-node change over time
11X(2)
10Edge-Nodes
X(1)How can we measure edge-node change over time?
Our Solution: YouLighter
1. Find edge-nodes: Design an unsupervised methodology to group single caches
2. Observe changes: Define a distance to highlight changes in edge-nodes definition and usage over time
12
Internet
PoP1 PoP2
Edge-node 2 Edge-node 3Edge-node 1
TstatTstatYouLighter
Can we exploit an “oracle” to validate the results?Yes! YouTube use IATA codes to name edge-nodes and caches
rx--ABCxxtxx.c.youtube.com
Where ABC corresponds to the cache’s closest airport e.g., r7--fra07t16.c.youtube.com Frankfurt
But, turns out to be not reliable ping r7--fra07t16.c.youtube.com ~= 50ms, 60TTL
ping r7--fra09t21.c.youtube.com ~= 110ms, 55TTL
Some caches sharing same IATA behave differently!
[more complicated than this… see the paper]
13
Validation
Outline
• Motivation
• YouLighter methodology
• Results
• Conclusion
14
Solution to Problem 1: Group caches 1. Get YouTube traffic logs from Tstat probes
2. Measurement consolidation and filtering
3. Feature selection• extract RTT, TTL• compute stats (percentiles, mean, variance, etc…)• Standardize the features (stats) to a unitary space
4. Multi-dimensional clustering using DBSCAN• Input:
• list of (cache_name, <feature1, feature2, … , featureN>) tuples
• Output:• A set of clusters (possibly edge-nodes) + Cluster of outlier
• Parameters:• Min Points (MP): min number of points to create a cluster• Epsilon (ε): maximum distance between any given point in a cluster and its
MPth closest neighbor in the same cluster (critical)15
16
DBSCAN Performance and Parameter Sensitivity
Best result when all indices are
equal to 1
• Tuning of DBSCAN• Which features? Different statistics
• Percentiles (20th, 35th, 50th, 65th, 80th) , mean+variance, etc..
• Which ε choice?
• Performance indices for clustering evaluation:• True Positive Rate: ratio between # of correctly labeled caches and total # of
caches
• Fragmentation: ratio between # of clusters and the # of observed labels
• Pureness: ratio between # of assigned labels and total # of labels in the ground truth
17
DBSCAN Performance and Parameter Sensitivity (cont’d)
• 1st week of November 2013 in ISP1• stable situation controlled by manual verification
• Percentiles vs Mean+Standard Deviation using different ε• Percentiles are more robust• Fix ε = 0.04
Now we can find Edge-nodes….
…And now…
how can we analyze Edge-Node evolution over time?
18
Solution to Problem 2: Constellation distance• Surprisingly, no methodology in literature1. Summarize each cluster in a single point called star Ĉ(n)
2. Astral Distance• For each star in Ĉ(n) compute all distances to stars in Ĉ(n+1) and take min• Repeat in the opposite direction
3. Constellation Distance• Sum of all Astral Distances
19Ĉ(2)
AD y4,C(1)( )
AD x1,C(2)( )
X(1) X(2)Ĉ(1)
Outline
• Motivation
• YouLighter methodology
• Results
• Conclusion
20
Change observed from ISP1-A
• Constellation Distance is effective at detecting changes
• Big changes during May – July 2013 • Highlighted also in ISP1-B dataset
21
22
What happened in May 2013Before change
During change
23
What happened in May 2013During change
After change
24
What happened in May 2013Before change
During changeAfter change
Impact on QoE
• Frank splits in• Bad Frank: caches with large std(RTT)
• Good Frank : caches with small std(RTT)
• Download throughput distribution of Frank vs Milan
25
Changes observed from ISP2
• Changes detected in March 2014
• No QoE impairment
26
Outline
• Motivation
• YouLighter methodology
• Results
• Conclusion
27
Conclusion
• YouLighter is effective at detecting changes in YouTube’s CDN infrastructure
• Effective clustering to unveil YouTube infrastructure
• Definition of a novel Constellation Distance• to identify changes between different clustering
• Help ISPs, network admins, etc. to react to changes• To limit QoE impairments• To improve traffic engineering
• Patent requested in 2014• Accepted in Italy• Pending in USA
28
Thanks for you Attention
29