network monitoring and measurements : techniques …csiat/network...monitoring.pdf · monitoring...
TRANSCRIPT
1Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Network Monitoring and Measurements : Techniques and Experience
Supratik BhattacharyyaSue Moon
2Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction – Internet architecture– The Sprint IP Backbone– Monitoring : requirements and challenges
• Current techniques– taxonomy, tools and techniques
• The IPMON project• The future of monitoring
3Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction – Internet architecture– The Sprint IP Backbone– Monitoring : requirements and challenges
• Current techniques– taxonomy, tools and techniques
• The IPMON project• The future of monitoring
4Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Internet Design Philosophy
• Packet switching• Continued communication around failures• Support for diverse services and protocols• Distributed management of resources• No access control• Simplicity at the core, complexity at the edge
5Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Internet Hourglass (Deering@IETF)
IP
TCP, UDP, ...
SMTP, HTTP, RTP, ...
email, WWW, phone, ...
ethernet, PPP, ...
CSMA, Sonet, ...
Copper, fiber, radio, ...
6Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
What is the Internet today?
UUnet
Sprint (AS)
BBN
Dial-up ISP
Tier 2 ISP
BT
Peering point
7Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Today’s Tier-1 Backbones
• Technologies– IP over SONET (POS)– IP over ATM– IP over MPLS
• Topology - points-of-presence (POPs)connected by long-haul fiber– numerous small POPs (e.g., UUNet)– relatively few large POP (e.g., Sprint)
8Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction – Internet architecture– The Sprint IP Backbone– Monitoring : requirements and challenges
• Current techniques– taxonomy, tools and techniques
• The IPMON project• The future of monitoring
9Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Sprint IP Backbone
10Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
A Backbone POP
Peer
Core Router
Other POPs
11Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Common Engineering Practices
• “Over-subscription” at edge• Protection
– large scale outages happen regularly– “over-provisioned” core– MPLS/TE does not help
• Service Level Agreements :– delay (e.g., 55 msec round-trip in USA)– loss (e.g., 0.3%)– port availability (e.g., 99.99%)
12Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Over-subscription and Over-Provisioning
1:3
1:3
customers
accessrouter
corerouters
Rest ofBackbone
Over-subscription Over-provisioning
13Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Intradomain Routing : IS-IS
• Link weights– inter-POP links >> intra-POP links
• Updating weights:– set by hand– modified infrequently, usually for large-scale
failures
• One-way latency across backbone governsweight selection (hence routes)
14Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IS-IS Load Balancing and Protection
R1
R3
R4
R6
R2 R5
• Multiple parallel links between POP pairs• Per-prefix splitting over equal cost paths
POP POP
21
1
1
115
34
34
34
1
1
1
15Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Interdomain Routing : E-BGP & I-BGP
R1
AS1
R4R5
B
AS3
E-BGP
R2R3
AAS2 announce B
I-BGP
IS-IS
16Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Multi-homing and Hot Potato Routing
SprintUUnet
Client
Server
Border Routers
Edge Routers
Request
Response
Peers need to be “consistent” intheir route announcements
17Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Transit vs. Non-transit AS
AS1
ISP1 ISP2
r1
r2 r2
r3
r2
r1 r3
• Transit AS– carries traffic between two other ASes– propagates routes learnt from other ASes
18Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Sprint practices
• Multi-homing with large peers and customers• Hot Potato Routing• Non-transit for peers• Transit for customers
19Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Sprint Service-Level Agreements
• Peers : no money exchanged– at a network exchange points (NAP)– point-to-point links(OC-3/12/48)
• Customers :– flat fee by link bandwidth
•Performance Guarantees :– 55 msec latency across USA– 0.3% loss– 99.9% availability
20Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction – Internet architecture– The Sprint IP Backbone– Monitoring : requirements and challenges
• Current techniques– taxonomy, tools and techniques
• The IPMON project• The future of monitoring
21Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring Implications of Network Design
• Network elements do not collect data• Protocols do not provide feedback• Network adapts to failures/congestion• No centralized collectors• Distributed administration:
– operator has no control beyond own network
22Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring Requirements
• Planning and design• Traffic engineering• Troubleshooting and fault diagnosis• Customer feedback• Research
23Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Planning and design
• Capacity Planning– where to put additional capacity, when?
• Controlled evolution– new link/router/customer– new applications/protocols
• New technologies– MPLS, QOS, etc.
• Router Design– buffer dimensioning, AQM, etc.
24Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Traffic Engineering
• Selecting primary and backup routes• Accommodating changes
– topology, customer usage pattern, newapplications
• Routing protocols– configuring BGP policies, ...
25Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Troubleshooting and Fault Diagnosis
• Some examples :– a link is suddenly overloaded– a route suddenly disappears from BGP table– DOS attack on a customer– router shows high CPU utilization– is a peer using me for transit?
26Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Operations and Management
• Need– up-to-date traffic and routing info– tools to correlate them
• But…– do no harm– simplicity above all !– minimal changes to network– cannot change router configuration
27Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Customer Requirements
• Adherence to SLA– difficult to verify/prove
• Protection for attacks– block, trace back to attackers
• Traffic Engineering :– how are packets being routed?– what are the delay/loss statistics on these routes?– protocol/application/AS number breakup– multi-homed customers
28Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Research
• Monitoring is key to evolution of the Internet :– understanding its global complexity– designing new algorithms/protocols– paradigm shifts (e.g., peer-to-peer)– changes in design philosophy
Challenge :collecting and analyzing massive volumes ofdata!
29Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Limitations
• Technical– poor support in routers– must not interfere with network operations– need storage and analysis infrastructure
• Non-technical– Proprietary– Legal– Privacy– Cost
30Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction :– current Internet architecture– benefits and challenges of monitoring
• Current Techniques:– taxonomy, tools and techniques
• The IPMon Project• The future of monitoring
31Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Taxonomy: Information Types
Data Plane– Packet-level
• port #, src and dst addr, type, length– Flow-level
• duration, total # of packets and bytes, application– Network element-specific:
• link utilization, dropped packet count– Network-level
• Can be derived from packet/flow-level• ingress vs. egress traffic
32Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Taxonomy: Information Types
Control Plane– Routing information
• static - snapshots of routing tables• dynamic - update messages, policies
– Topology• Databased in facility management system• Physical/L2/L3 layer
– Configuration• relatively static, but not widely available.
33Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Taxonomy: Observation Points
• Edge vs. core:– Access is often limited– Link-level statistics not accessible from edge– Important to combine observations from
different points
34Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Metrics Derived from Measurements
Loss– Drops per interface– end-to-end loss– per-AS loss rate
Delay & Jitter– One-way vs. RTT– Single vs. multi-hop– Avg vs. max
35Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Metrics Derived from Measurements
Utilization levels– Derived from byte counts
Routing reachability– Duration and frequency of unreachability– # of ASs and prefixes affected
Failure– Frequency– Duration
36Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Issues
Amount of data– Storage– Processing and transfer overhead
Level of aggregation– Temporal aggregation
• Sub-second, minutes, daily, and weekly
– Research vs. operation• Packet-level vs. flow/link/network-level
37Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Network Matrices
• Network-wide representations of variousmetrics– traffic volume (Traffic Matrices)– delay, loss, utilization, …
• Many statistics– average, peak, variance, etc.
• Many aggregation levels– POP-to-POP, router-to-router, AS-to-AS, ...
38Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Example : AT&T Latency Matrix
263714355018
6906
4264
4719
2040 21
3634
3254
22
61
ATL
CAM
NY
LA
DEN
DAL
CHI
CAM
LA
DEN
DAL
CHI
NY
Latency in milliseconds
Current Average : 35 msec
39Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Which Network Matrices to Build?
• What do applications need?– network design, TE, customer reports,
operations, ...– may only need some components
• What is scalable?– e.g. router-to-router is far bigger than POP-to-
POP
40Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Active Measurement
Probes are injected into the network– Can overload the network
• careful calibration needed• Poisson-generated traffic often used
– Measure from the edges• Does not need access to inside of network• Can only infer network-internal performance
41Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Active Measurement Tool: Ping
ICMP-based tool for host reachability• Algorithm
– Sends an ICMP echo request with:• Identifier for unique ping process• Sequence number per echo request
– Receiving host returns an ICMP echo reply– Prints out RTT, TTL, and seq. #.
• Issues– different processing path from other packets
42Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Active Measurement Tool: Traceroute
Used to find out the forward path to a host• Algorithm
– Send an IP datagram with TTL=1– First router sends back ICMP time exceeded– Then send a datagram with TTL=2– Continue till destination is reached/TTL expired
• Issues– not suited for performance measurements
43Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Packet-Pair Based Bandwidth Estimation
RTT
Source Router Bottleneck Sink
bottleneckrate
rateestimate
Time
44Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
PathcharHop n-1 Hop n
FE FE
c :link capacityS : probe sizep : prop. delaye : noise
Vary L, fit line to min[RTT(n) - RTT(n-1)]
Slope of line gives 1/c
2p = RTT(n) - RTT(n-1) - [2L/c + e]
Based on TTL expiration (like traceroute)
45Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Active Measurement Projects
National Internet Measurement Infrastructure(NIMI)– Provides infrastructure for active measurements– BSD-based hosts, scalable, dynamic
NLANR AMPRIPE NCC
46Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Commercialized Monitoring Services
Matrix– Ping-based monitoring of customers
KeyNote– Subscription-based service of application-
specific monitoring
Shortcomings:– Techniques often depend on starting points, not
on the network
47Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Passive Measurement
No traffic injected for measurement purpose– Not invasive
• Only data collection increases traffic
– Access limited• Measurement about total traffic• Privacy/Security - serious concern
48Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tcpdump
Puts a network interface to promiscuous mode– captures only frames matching filters– shows
• network interface info, time, src and dst, etc.• or entire payload
Issues– security: sees all frames on the Ethernet– host performance penalty– timestamps not always dependable
49Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Passive Measurement Examples
Packet monitors– Tcpdump for Unix-based hotst– Dedicated measurement systems
• OC3MON, IPMON
Router/switch traffic statistics– Network internal behavior– SNMP MIBs– flow-level information
50Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
SNMP
Managing entity
Manageddevice
Manageddevice
data
data data
Simple Network Management Protocol
Agent Agent
51Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Packet-level measurements
• Pros :– very fine granularity
• Challenges :– link speeds are increasing!– Large volumes of data– system design issues:
• disk/PCI bus speeds• installation cost
52Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Flow-level measurements
• Many granularities– five-tuple, BGP prefixes, IP header fields, etc.
• Pros– aggregated information
• Challenges– packet-to-flow mapping– terminating expired flows– CPU/memory requirements on routers
53Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Flow-level Monitoring : Cisco Netflow
NetFlow: mechanism to gather flow statistics–caches flow statistics per router interface andsends records by UDP
–Route-based aggregation possible• By AS, protocol port, src prefix, dst prefix, prefix
–Data flows from NetFlow-enabled device toFlowCollector
• No MIB or SNMP• More aggregation possible at FlowCollector
54Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Netflow Pros and Cons
Pros–provide statistics on every flow–detailed prefix-to-prefix traffic matrix
Cons–High memory requirement for short flows–feasible at access routers, not backbone routers– route changes flushes flow cache, may causeCPU overload
–Flow statistics in UDP: losses during peak usage
55Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Commercial Passive Measurement Tools
HP’s OpenView– Integrated network management tool based on
SNMP statistics– Combined with configuration, visualization,
reporting mechanismsAgilent NetMatrix:
– Passive monitoring gear for LAN and up toOC-12 ATM
Niksun’s NetVCR solutions– Packet monitor based management system
56Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Research Project: CAIDA
Cooperative Association for Internet DataAnalysis (http://www.caida.org)– Measurement tools:
• OCxMON, cflowd, CoralReef, NeTraMet: passive• Skitter: global coverage by traceroute
– Visualization tools:• Walrus, Otter, Plankton: graph visualization• Gtrace: traceroute visualization• MapNet: major ISP maps
57Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Some Other Research Projects
• AT&T Labs Network Measurement Tools– active/passive, flow-level data, routing updates
• Surveyor:– one-way loss and delay as defined by IPPM
NLANR’s PMA and AMP– Passive and active measurement sites– Packet traces from passive measurement made
public
58Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Routing Research Projects
Routeviews– 50+ peering at route-views.oregon-ix.net– MRT format RIBs and BGP updates, “show ip
bgp” dumps, route dampening data– only E-BGP
RIPE (Réseaux IP Européens)– routing updates from 9 mostly European IXs– “Looking Glass” services for BGP– Routing information service (RIS)
59Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IETF/IRTF Activity
• IETF Working Groups– IPPM– IPFIX– BMWG– PSAMP
• IRTF– IMRG
60Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction • Current techniques• The IPMON project
– infrastructure and data– selected results
• The future of monitoring
61Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction • Current techniques• The IPMON project
– infrastructure and data– selected results
• The future of monitoring
62Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON Goals
• Identify open problems for an operationalnetwork, influence its evolution
• Feedback to network design and engineering• Tools for operations and management
63Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Data
• Packet traces• SNMP statistics• IS-IS/BGP Listeners• Active Measurement Probes
64Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Collecting packet traces
• Insert optical splitter on links in multiplePOPs (OC-3/12/48)
• Collect and timestamp TCP/IP headers (44bytes) - 2us clock accuracy
• Collect routing information (IS-IS, BGP)• Transfer data to lab for off-line analysis
(SAN based analysis platform)
65Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring System Architecture
BackboneRouter
Backbone links
Peering points
AccessRouter
AccessRouter
AccessRouter
customer customer customer
Analysis platform(located @ Sprint ATL)
66Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring System Architecture
OC-3/12/48link
opticalsplitter
GPSclock
SONET
DAG Card
main memory buffer
diskarray
IPMON system
Linux PC with multiple PCI buses
90%
10%
67Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Current Status
• Three POPs :– one west coast, two east coast– 50+ IPMON systems– OC-3 to OC-48 capabilities
68Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Practical Constraints
• Difficult to monitor operational network :– complex and expensive procedure– network evolves too fast
• Technology constraints :– line speeds are increasing...– PCI bus, disk speeds need to keep up!– need sophisticated storage systems
69Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Python Routeing Toolkit (PyRT)
Listener software to collect IS-IS/BGP updates :– IS-IS listener :
• collects updates on link failures, weight changes, etc.
– BGP listener :• implements minimal subset of BGP state machine
– Publicly available– Data format compatible with other projects
(Routeviews, etc.)
70Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON Data Management System
• Motivation– management of 10+ terabytes– keep track of “metadata”– information sharing
• Entities to manage– packet traces, routing updates
• location, date of collection, link speed, etc.– analysis results
71Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction • Current techniques• The IPMON project
– infrastructure and data– selected results
• The future of monitoring
72Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON : Selected Results
• Part I: Traffic mix and patterns• Part II: Packet delay and reordering• Part III: SNMP analysis• Part IV: Routing analysis
73Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON: Selected Results
• Part I: Traffic mix and patterns– volume– application breakup– packet sizes– detecting DOS attacks
• Packet delay and ordering• SNMP analysis• Routing analysis
74Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Volume on an OC-12 Link
75Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Application Breakup on OC-12 Links
Tier-2 access CDN access
www
peer-to-peer
www
streaming
76Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Packet size distribution on OC-48 Link
40 - 23%1500 - 14% 576 - 10%
Trace : 1 hour, 676 million packets
77Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Detecting DOS Attacks
TCP SYN bursts
78Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Lessons Learnt - Part I
• Links differ in characteristics• Web traffic dominates, but p2p is making
inroads• Monitoring reveals changes in traffic mix
and characteristics
79Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON: Selected Results
• Traffic mix and patterns• Packet delay and reordering• SNMP analysis• Routing analysis
80Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Delay through a router - OC-12
[Papagiannaki et al 02]
81Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Delay across the USA
82Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
4 6 5 4 3 2 1
measurement point
Out of sequence
Out of Sequence packets
• Challenge– how much of packet reordering?– where do they occur?– what are the causes?
• Our approach : analyze TCP flows in themid-point of its path
83Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
OOS Classification [Agarwal et al. 02]
Out-of-sequence
Unknown
Reordering
Network Duplicate
Unneeded Retransmission
Retransmissions
CDN Tier-1 ISP Tier-2 ISP
2
19
7772
7
0.01
1
3
54
5
211
0.04
11
0.07
88
1.53
All numbers in p.c.
84Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Lessons Learnt - Part IIb
• Delays– speed of light– no jitter
• Out-of-Sequence– about 5% of packets– mostly due to loss– little duplication due to loops, etc.
85Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON: Selected Results
• Traffic mix and patterns• Packet delay and ordering• SNMP analysis• Routing analysis
86Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Maximum Link Load on Sprint Backbone
87Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
On/Off Periods of Link Overload
88Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Effect of Failures
89Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Lessons Learnt - Part III
• At all times, there are some overloaded links :– difficult to distribute traffic evenly– Link failures happen often– difficult to plan for link failures
• Average utilization conveys incomplete picture• Essential to dampen peaks before increasing
average utilization
90Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IPMON: Selected Results
• Traffic mix and patterns• Packet delay and ordering• SNMP analysis• Routing analysis
91Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Information Sources
• BGP table snapshots• ISIS/BGP listener logs• Router logs• Controlled bi-directional active probes• SNMP data
92Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Duration of Link Failures at IS-IS Level
Many short failures, usually one at a timeSome long outages, usually several links
10 min, 80%
1 min, 50%
93Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
BGP Activity: Temporal Trends
Exceptions
Background noise:• 50-250 BGP udpates/min• O(1000) prefixes~ 1% routing table
Peaks at late night/early morning
20-40% of routing table
94Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Active BGP Prefixes
No. of entries in BGP table ~120K !
95Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Elephant and Mice Prefixes
Most elephant prefixesare /16 to /26
96Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Other BGP findings
• About 5-6% of BGP updates affect prefixescarrying 80% of traffic– little effect on packet loss/reordering– but what about shifts in traffic patterns?
• Elephant flows are not stable– hard to use for load balancing
97Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Lessons Learnt - Part IV
• Need to understand and control effect of“failures”
• Routing is poorly understood– interaction of BGP/IS-IS– how to set BGP policies– how to configure IS-IS (timers, etc.)
• Beneficial to correlate different data sets
98Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Outline
• Introduction • Current techniques• The IPMON project
– infrastructure and data– selected results
• The future of monitoring
99Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Future of Monitoring: Topics
• Obtaining traffic matrices– inference techniques– direct measurements
• Router support for path tracing– IP traceback– Trajectory sampling
• Sampling– approaches and challenges
100Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Future of Monitoring : Topics
• Obtaining traffic matrices– inference techniques– direct measurements
• Router support for path tracing– IP traceback– Trajectory sampling
• Sampling– approaches and challenges
101Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
POP-to-POP Topology
An under-specifiedproblem :n links, O(n2) O-D pairs
Need statistical techniquesto make “smart” guessAssume routing is known
Estimating TMs from SNMP Link Counts
ATL
ORL
NYC
PEN
RLY
CHI
KCSJ
SEA
STK
3
3
3
3
3
3
102Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
POP-to-POP Topology
Formal Problem Statement
Xj: Traffic demand for POP pair j
n: number of POPsr: number of linksc: Number of POP pairs
– c = n * (n - 1)A: r by c 1-0 routing matrixYi: Link count on link i
ArxcXc = Yr
ATL
ORL
NYC
PEN
RLY
CHI
KCSJ
SEA
STK
83
3
3
10
2
3
5
5
4
15
13
4
4
103Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Proposed Statistical Techniques
• Linear Programming [Goldschmit 00]• Network Tomography [Vardi JASA 96]• Bayesian Estimation [Tebaldi & West 98]• Maximum Likelihood Estimation [Cao et al
98]
104Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
• SNMP data is not enough– large errors
• Bayesian : 20%-60%• EM : 11% - 40%• LP is larger
– statistical assumptions not valid– marginal gain from partial information
• All methods sensitive to prior• No good model for O-D traffic flows
Comparison [Medina et al. 2002]
105Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Estimating POP Fan-out [Medina et al 02]
• Why does POP i choose POP j– user behavior– network design and configuration
• Use choice models– criterion: maximize utility
Uij = Vi
j + eij
deterministic random
Vij = m1* Wi
j(1) +m2* Wij(2) + … + gj
106Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
• Demand between i and j : Xij = Oi * αij
Oi = total traffic sourced by POP iαij = fan-out factor
• Model for fanout
• Initial Model: estimate m1 m2 and gj using– Wj(1): outgoing bytes at POP i
– Wi(2): incoming bytes at POP j
Estimating POP Fan-out (contd.)
exp (Vij)
exp (Vik)
107Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Measured fanouts
nyc pen sj0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
POP name
Fan
outs
to e
gres
s P
OP
sObserved Fanouts
108Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Initial Validation: Fitting Observed Data
• Parameters estimated using MLE with– SNMP link counts– fanout for three POPs
nyc tac che sea kc orl sj fw stk pen atl ana rly chi0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
POP number
Fra
ctio
n of
traf
fic fr
om N
YC
NYC:SRC: UINTERLOUT - DST: UINTERLIN
observedpredictedValidates
exponential formfor fan-out
109Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Direct Measurement at Ingress
Measuring at egress : hard to disambiguateingress
110Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Challenges of Direct Measurements
• Scalability– monitor every ingress interface– how much data, what granularity– router overhead
• Possible directions– build components of TM for specific purposes
• customer request, problem peers, etc.
– can we monitor periodically?
111Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Future of Monitoring : Topics
• Obtaining traffic matrices– inference techniques– direct measurements
• Router support for path tracing– IP traceback– Trajectory sampling
• Sampling– approaches and challenges
112Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IP Traceback : Basic Idea
• probability of receiving mark from d hops = p(1-p)d-1
• reconstruct path based on counting marks from each router
a
b
c
Victim
a
a b
cba
router marks itsaddress in apacket withprobability p
Goal : Trace sink tree from victim to source of attack,even with address spoofing
113Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
IP Traceback : Edge Sampling
• Need additional (short) field to mark distance• Further compression possible...
a
b
c
Victim
a XOR b
b XOR c
c
•Issues•slow process : p = 0.51, d= 15, need 42K packets•not robust to multiple attackers
114Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Trajectory Sampling [Duffield et al 00]
• Key Idea :– Track the “trajectory” of a packet by consistent
sampling everywhere
• No statistics or inference techniques• Uses :
– how is a customer’s packet routed?– is routing working as expected?– diagnosing source address spoofing
115Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Trajectory Sampling
MeasurementSystem
label src dest length
g(x1) a.b.c.d w.x.y.z 500Ingressnode
Egress node
Sample same pkt everywhere or nowhereg(x) is deterministic
116Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
The Future of Monitoring : Topics
• Obtaining traffic matrices– inference techniques– direct measurements
• Router support for path tracing– IP traceback– Trajectory sampling
• Sampling/Filtering– approaches and challenges
117Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Why Sampling/Filtering?
Problems with large volumes of data– feasibility of collection at high-speeds
• memory/bus/processor requirements
– storage limitations– complexity of analysis
118Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
State-of-Art
• Cisco– sampled netflow
• capture 1 in N• aggregate by five-tuple
• Juniper– filter on any combination of header fields– sample 1 in N
• recommends 1 in 1000 or less
119Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Comparison of Approaches [Claffy et al 93]
• Traffic Characteristics– packet size distribution– inter-arrival times
• Key Results– Packet-triggered techniques better than time-triggered– difference within each class is small
systematic
simple random
stratified random
120Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Flow-Level Sampling [Duffield et al 01]
• Motivation– charge for measured usage
• fixed charge below threshold• usage-based charged above threshold
– sample a subset of flows with commonproperty, e.g., source/destination address
• Key Idea :– flow size distribution is heavy-tailed– need lower estimation error for larger flows
121Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Flow-level Sampling [Duffield et al 01]
• Approach– Each flow i has size and color– Goal is to estimate total usage
– Sample flow i with probability min {1, / Z}
• Important results– yields unbiased estimate– variance decreases as increases– much smaller errors than 1 in N sampling
∑=
=cci
ii
xcX:
)(ix ic
ix
ix
122Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Towards a Two-Tier Monitoring System
Goal : make monitoring an integral part ofbackbone engineering and operations
• Tier 1– network-wide continuous monitoring– coarse-grained, long time-scales
• Tier 2– on-demand monitoring at specific points– fine-grained, short time-scales
123Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Challenges
Collecting fine-grained packet data is hard!
• Sampling techniques– should be requirement-driven– need complete traces for evaluation
• (Re)Design hardware to support monitoring– custom-built equipment– router architecture...
124Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring on Routers
• Pros– no additional
infrastructure– up-to-date routing
tables
• Cons– space and storage
constraints– switched backplane
Interface
CrossbarSwitch
RouteProcessor
Modern Router Architecture
Interface
Interface
Interface
125Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Monitoring on Routers: Open Issues
• Feasibility– how much memory?– what operations can
be performed?– What part of the
packet to capture?– What to export, how
often?
Interface Card
CrossbarSwitch
RouteProcessor
Periodicexport
To collectionstation
CPU
memory
Incoming packetFIB
126Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
Tutorial Summary
• Monitoring IP networks– difficult… but essential
• Some tools/techniques exist– but a lot more remains to be done
• Need integrated infrastructure– router support– wider data collection and sharing– storage/analysis capabilities– sampling and scalable export
127Advanced Technology LaboratoriesAdvanced Technology LaboratoriesCopyright
How To Contact Us
• Email– [email protected]– [email protected]
• Visit us on-line– www.sprintlabs.com– under development: ipmon.sprintlabs.com