the plane distributed measurement infrastructure overview, insights & hindsights journée du...

36
The Plane distributed measurement infrastructure Overview, insights & hindsights Journée du Conseil Scientifique de l’Afnic , #JCSA2015, July 9 th 2015 Grant Agreement n. 318627 Dario Rossi Professor [email protected] http://www. telecom-paristech .fr/~drossi

Upload: ambrose-freeman

Post on 23-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

The Plane distributed measurement infrastructure Overview, insights & hindsights

Journée du Conseil Scientifique de l’Afnic , #JCSA2015, July 9th 2015

Grant Agreement n. 318627

Dario RossiProfessor

[email protected]://www. telecom-paristech.fr/~drossi

Today Internet

Which ISP is the best in my area ?

Why is not working?

How does it work?How to do better?

User viewpoint ISP viewpointCloud viewpoint

How to optimize my network for users apps?

Where is traffic coming from?

Motivations Measurement Architecture Ecosystem Example of use Summary

How to keep usershappy & engaged?

Researcher viewpoint

Today Internet

“The Internet is the first thing that humanity has built that humanity doesn't understand, the largest experiment in anarchy that we have ever had.”

Eric Schmidt – ex Google Exec. Chairman

Motivations Measurement Architecture Ecosystem Example of use Summary

3

…. issing orchestration

Internet measurementShed light on the Internet operational obscurity

Lot of measurement tools…

Plane: avoid to reinvent the wheel& assist in building automated pilots!

Motivations Measurement Architecture Ecosystem Example of use Summary

Plane:

4

5

Two kinds of measurements

• Passive– Observe network traffic

without interference– Similar to Aristotle’s

observational method

• Active– Perturb the network &

measure its reaction– Similar to Newton’s

experimental method

Ἀριστοτέλης Sir Isaac Newton@aristotle @newton

1388 372 2411 353

All science is either practical, poetical or theoretical (Metaphysics)

Every body continues in its state of rest, or of uniform motion in a right line, unless it is compelled to change that state by forces impressed upon it (Principia)

Motivations Measurement Architecture Ecosystem Example of use Summary

5

Passive m easurements

Deploypassive sensors

Extract meaningful

analytics

Post Process

Passive sensor

Raw (or half-cooked) data

Data collection

User traffic

Motivations Measurement Architecture Ecosystem Example of use Summary

6

Active m easurements

Raw (or half-cooked) data

Data collectionActuator

Control Controlactive sensors

A

Post ProcessActive probes

Perform active measurements

Extract meaningful

analytics

Motivations Measurement Architecture Ecosystem Example of use Summary

7

erging all together

Activesensor

Control

Passivesensor

Data

Integration with existingmonitoring frameworks

A

Active and passive analysisfor iterative root-cause-analysis

Motivations Measurement Architecture Ecosystem Example of use Summary

8

~~~Big data hype~~~

• Traffic measurements orders of m agnitude – 40Gbps (157PB/yr) full-packet processing at each passive

sensor, with up to O(107) traffic classifications per second…– O(109) active probes in an Internet anycast census…

(more on that later if time allows)

• To be compared with – The Large Hadron Collider (LHC), generates ~25PB/yr

and O(108) collisions per second– Sloan Digital Sky Survey (SDSS), generates ~73TB/yr– Capacity of the Human genome with 2-bits bases ~ O(109)

(the Big Data moment)

[Image from http://irevolution.net/category/big-data/][courtesy of Sony Classical]

Note: simplified view, yet the devil hides in the details

Plane architecture walkthrough

ARepositoriesProbes

SSupervisor

Client/Reasoner

Motivations Measurement Architecture Ecosystem Example of use Summary

10

Plane architecture: entities

ARepositoriesProbes

SSupervisor

Client/Reasoner

Orchestrate measurements; expert system (automatic pilot)

Access control; delegation; boostrap

Past sensor data; legacy databases;

Big data processing

Sensor data;Proxies for

legacy tools/platforms

Motivations Measurement Architecture Ecosystem Example of use Summary

11

Plane architecture: interfaces

AProbes

SSupervisor

Client/Reasoner

Native probe

Proxy probe/repo

Native repo

Repositories

(avoid reinventing the wheel!)

Motivations Measurement Architecture Ecosystem Example of use Summary

12

Plane architecture: messages

ARepositoryProbe

SSupervisor

Client/Reasoner

Schema-centric definition, with a type registrySchema completely describesthe measurement, JSON format

C

S

R

Motivations Measurement Architecture Ecosystem Example of use Summary

13

Note: simple ping example,for illustrational purposes

Plane architecture: messagesC S R

{"capability": “measure”,"version": 0,"registry": http://ict-mplane.eu/reg,"label": “ping-aggregate”"when": “now .. future / 1s”"parameters": {"source.ip4“: "137.3.1.1", "destination.ip4": "*"},“results”: ["delay.twoway.icmp.us.min", "delay.twoway.icmp.us.mean", "delay.twoway.icmp.us.max", "delay.twoway.icmp.count"]}

{“specification": “measure”,"version": 0,"registry": http://ict-mplane.eu/reg,"label": “ping-aggregate-experiment1”,"when": “now + 30s / 1s”,"token" : "0f31c9033f8fce0c9be41d4944""parameters": {"source.ip4“: "137.3.1.1", "destination.ip4": “137.194.164.1"},“results”: ["delay.twoway.icmp.us.min", "delay.twoway.icmp.us.mean", "delay.twoway.icmp.us.max", "delay.twoway.icmp.count"]}

{“specification": “measure”,"version": 0,"registry": http://ict-mplane.eu/reg,"label": “ping-aggregate-experiment1”,"when": “2015-09-07 14:30:02.123 […] /1s“,"token" : "0f31c9033f8fce0c9be41d4944“,"parameters": {"source.ip4“: "137.3.1.1", "destination.ip4": “137.194.164.1"},“results”: ["delay.twoway.icmp.us.min", "delay.twoway.icmp.us.mean", "delay.twoway.icmp.us.max", "delay.twoway.icmp.count"]“resultsvalue” : [[ 2390,2983, 6600,30]]}

*grin* If I'd known then that [ping] would be my most famous accomplishment in life, I might have worked on it another day or two and added some more options.

Mike Muuss - US Army Research Laboratory

Motivations Measurement Architecture Ecosystem Example of use Summary

14

Plane architecture: transport

RepositoryProbe

SSupervisor

Client/Reasoner

mPlane over: • HTTPS (HTTP over TLS)• WSS (WebSockets over TLS)• SSH (Secure Shell)

Indirect export uses custom protocols:IPFIX, FTP, BitTorrent, Apache Flume, RFC1149, RFC6214, etc.

Motivations Measurement Architecture Ecosystem Example of use Summary

15

Plane architecture: workflow

RepositoriesProbes

SSupervisor

Client/Reasoner

1. Send high levelspecification

2. Decompose specification

3. Send lowlevelspecification

4. Measure

5. Indirect export of Big data

6.Analyze

7.Return results

8. Aggregate results

9.Return high-levelresults

10. Decide and iterate

Motivations Measurement Architecture Ecosystem Example of use Summary

16

Plane architecture: Inter-domain

RepositoriesProbes

SupervisorS

S

S

Client/Reasoner

Motivations Measurement Architecture Ecosystem Example of use Summary

17

The broader easurement ecosystem

• “From global measurements to local management”– 10 partners, 3.8 MEUR, FP7 STREP– More focused use case: Build a framework out of probes– Knowledge sharing (e.g., joint work, Dagsthul seminars, etc.)

• IETF Large-Scale Measurement of Broadband Performance (LMAP)– Defines the components, protocols, rules, etc., but does not specifically

target adding “a brain” to the system– Common core set, Largely interoperable

• IETF IP Performance Metrics (IPPM)– Registry related, we use its vocabulary as much as possible

• IETF IP Flow Information Export (IPFIX)– Indirect export related, active contributors

• Others in scope– IETF DOCTORS; tcpm; ConEx; NETCONF; IRTF NMRG; ETSI STQ; ITU SG12

Motivations Measurement Architecture Ecosystem Example of use Summary

18

Plane consortium

Marco MelliaPOLITO

Saverio NicoliniNEC

Dina PapagiannakiTelefonica

Ernst BiersackEurecom

Brian TrammellETH

Tivadar Szemethy NetVisor

Dario RossiENST

Fabrizio InvernizziTelecom Italia

Guy LeducUniv. Liege

Pietro MichiardiEurecom

Pedro CasasFTW

Andrea FregosiFastweb

• 16 partners– 3 ISPs– 6 research

centers– 5 universities– 2 SMEs

• FP7 IP– 11 MEUR– 3 years long– ends early 2016

Motivations Measurement Architecture Ecosystem Example of use Summary

19

Plane achievements so far• Standardization &dissemination– 3 RFCs, 5 drafts– 80+ papers, including 4 Best paper awards and

ACM SIGCOMM IMC & CoNEXT, IEEE INFOCOM• Software

https://www.ict-mplane.eu/public/software and https://github.com/fp7mplane/(Ready-to-use Virtual Machines under preparation)

• Check the demos on https://www.youtube.com/channel/UCHGS6UlUKvGZTyt5DemmPaw

“Biased sampling”: Anycast geolocation as a showcase of mPlane achievements

Note: hard to summarize!

Motivations Measurement Architecture Ecosystem Example of use Summary

20

IP Anycast • Anycast: set of equivalent replicated servers, traffic

routed to closest replica (in BGP sense)• Question: where are Google DNS servers 8.8.8.8?

Mountain View, CA (IP2Location) New York, NY (Geobytes)United States (Maxmind)

?????1ms ≈100Km

2ms from EU to US ?????

Speed of light violation !!

Tools using distributed measurement aren’t better !

Time varying answers Unknown accuracy

Commercial databases Distributed measurement

Motivations Measurement Architecture Ecosystem Example of use Summary

21

Geolocation techniqueMotivations Measurement Architecture Ecosystem Example of use Summary

MeasureLatency

Planetlab/RIPE Atlas

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassification

Maximum likelihood

DetectSpeed of light

violations

Scalability

Infrastructure ?Lightweight

Iterate Feedback

Filter latency noise

Measure Detect Enumerate Geolocate Iterate

D. Cicalese, et al. “A fistful of Pings: Accurate and lightweight Anycast enumeration and Geolocation”, IEEE INFOCOM 2015 22

measurement infrastructure

• RIPE Atlas vs PlanetLab

• Anecdotal understanding – For some deployments,

RIPE includes PlanetLab – For others, the union is greater than the sum of the parts !

• mPlane enables/simplifies systematic studies

Motivations Measurement Architecture Ecosystem Example of use Summary

Infrastructure ?Footprint VPs ASes Countries

RIPE Atlas 7k 2k 150PlanetLab ~300 180 30

(Microsoft IP/24 Example)

23

Anycast censusMotivations Measurement Architecture Ecosystem Example of use Summary

• O(107) targets x O(102) active sensors• O(103) targets /sensor /second• O(103) targets are anycast – needle in the IPv4 haystack

http://www.telecom-paristech.fr/~drossi/anycast

ScalableLightweight

24

Su ary

• Overview– Measurements to shed light on Internet operational obscurity

• Insights– Measurement plane to facilitate expression

of measurement capabilities and needs– Allow users to concentrate effort on hard problems– Avoiding time consuming tasks

• Hindsights – Crucial to foster adoption (GitHub, ready-to-use VMs)

to reach a critical mass (incentive in proxying)

Motivations Measurement Architecture Ecosystem Example of use Summary

25

26

any thanks

?? || //

Motivations Measurement Architecture Ecosystem Example of use Summary

27

Backup slides

Plane architecture: simplistic view

Repo

sito

ry c

ompo

nent

s

Measurement components (aka Probes)

mPlaneAPI

A

mPlaneAPI

Reasoner

ClientSSupervisor

Client

proxy

S

Plane architecture: gory details

Methodology overview

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassification

Maximum likelihood

Iterate Feedback

DetectSpeed of light

violations

Measure• PlanetLab– 300 vantage points– Geolocated with Spotter (ok for unicast)– Freedom in type/rate of measurement

ICMP, DNS, TCP-3way delay, etc• RIPE– 6000 vantage points– Geolocated with MaxMind (ok for unicast)– More constrained (ICMP, traceroute)

In this talk: min over 10 ICMP samples

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassificationMaximum likelihood

Iterate Feedbac

k

DetectSpeed of light

violations

Detect

The vantage points p and q are referring to two different instances if:

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassificationMaximum likelihood

Iterate Feedbac

k

DetectSpeed of light

violations

Packets cannot travel faster than the speed of light

Enumerate • Find a maximum independent set E

– of discs such that:– Brute force (optimum) vs Greedily from smallest (5-approximation)

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassificationMaximum likelihood

Iterate Feedbac

k

DetectSpeed of light

violations

Geolocate• Classification task

– Map each disk Dp to most likely city– Compute likelihood (p) of each city in disk

based on:• ci: Population of city i• Ai: Location of ATA airport of city i• d(x,y): Geodesic distance• a: city vs distance weighting

• Output policy• Proportional: Return all cities in Dp with

respective likelihoods• Argmax: Pick city with highest likelihood

Frankfurt p=0.30

Zurichp=0.10

Munich p=0.60Munich

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassificationMaximum likelihood

Iterate Feedbac

k

DetectSpeed of light

violations

rationale: users lives in densely populated area; to serve users,

servers are placed close to cities

airports: simplifies validation against ground truth (DNS)

Iterate

• Collapse– Geolocated

disks to city area

• Rerun– Enumeration

on modifiedinput set

Increase recall of anycast replicas enumeration

Measure Latency

Planetlab/Ripe

EnumerateSolve MIS

Optimum (brute force)5-approx (greedy).

GeolocateClassificationMaximum likelihood

Iterate Feedbac

k

DetectSpeed of light

violations

Performance at a glance– Protocol agnostic and lightweight

• Based on a handful of delay measurement O(100) VPs• 1000x fewer VPs than state of the art

– Enumeration• iGreedy use75% of the probes (25% discarded due to overlap)• Overall 50% recall (depends on VPs; stratification is promising)

– Geolocation• Correct geolocation for 78% of enumerated replicas• 361 km mean geolocation error for all enumerated replicas

(271 km median for erroneous classification)