renater network management approach - garr network management approach frederic loui ... • all the...
TRANSCRIPT
1
RENATER Network management approach
Frederic LOUIFrançois-Xavier ANDREUNetwork Backbone Operation & Engineering
Rome, June 22nd–25th 2009
[email protected]@renater.fr
2
Housekeeping
• We value your feedback - don't forget to complete your training session evaluations
• Please switch off the bell of your mobile phones
• All the slides will be made available• Don’t hesitate to ask questions ☺
3
Round table
• Who are you ?• About me ☺• Do you have any specific expectations
regarding this training course ?
4
Training course objective
• Expose RENATER network management approach
• Management approach based on several constraints (Legacy, technical, organizational)
• Case study and hands-on • MAIN GOAL is to provide you enough
material, so that YOU can start/adjust your own network management approach
5
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies • Hands-on on tools
6
RENATER big picture
• French NREN (National Reseach and Education Network)• Geographic scope:
• 58 PoPs (at least one per region) + overseas territories PoPs• +800 sites connected
• Connectivity:• Generally, sites are connected via MAN or regional networks• Few sites are directly connected (mainly Universities)
• Additionnal services:• LIR: Local Internet Registry for Education & research
community• SFINX: Service for French Internet Exchange
7
RENATER big picture
• RENATER network version: fifth iteration• Optical networking
• Newly “owned” Dark fiber infrastructure• Links = n*10Gbps• Lightpath for dedicated research project• CIENA DWDM equipment
• Layer 2 switching• C6500, C4500, C3750 for L2
• Layer 3 routing• CRS-1, 12K, 7609, 7200 for L3• Powered by IOS and IOS-XR
8
RENATER big picture
• What network services do we provide ?• Basic IP
• IPv4 unicast / multicast• IPv6 unicast / multicast
• VPN services• L3VPN Aka “MPLS-VPN”• L2VPN (VPWS/802.1q)
• Additional network services• IP telephony• SSO (Single Sign On)• Anti-spam
• And in a near future…• 6VPE• MVPN• VPLS
9
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
10
Network Management concepts
• Monitoring- Ease and improve supervision of the Network. - Get diagnostics tools to help us analyse the network behaviour
• Planning- Optimize architecture- Ensure that the network scales well in terms of capacity- Establish a clear trends forecast
• Security- Increase network security by detecting them pro-actively and quantify their effects
• Accounting / BillingMake sure a good correlation between cost, SLA and effective usage of the network
• Performance- Make sure that the network is ready and isbehaviour as expected depending on the service level agreement
• …
• Network Management ? For what ?
11
Network Management concepts
• TYPICAL TELECOM ORGANIZATION STRUCTURE• Network Operation Control department
• Customer Level 1 SPOC• Customer Level 2 SPOC (Escalate issue to expert relying in engineering team)
• Planning department• Monitor network backbone and customer link usage• Drive network backbone evolution in terms of bandwidth capacity
• Provisioning / Configuration• Configure the SP equipement as per the change management process• Ensure customer’s network “Life Cycle management”
• Engineering department• Traffic Priority/Congestion management• Flappy error condition
• Product marketing department• Inspect the market share according to customer needs (Business Case)• Study new services in terms of financial cost & revenue
• Commercial department• Promote key product• Elaborate price list
• Billing department• Ensure that the billing is timely accurate
13
Network Management concepts
• FCAPS, anybody heard about that ?• Fault
• Pro-active fault detection• Alarm management
• Configuration• Change management process• Configuration management
• Accounting• Per customer capacity planning• Per link capacity planning
• Performance• Traffic Priority/Congestion management• Flappy error condition
• Security• Single Sign on• Centralized AAA policy
14
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
15
Constraint
• As a NREN we’re a Service provider• Small TELECOM company
• Previous topics can be applied to our case
• Customer are:• Education & research organizations• Universities• European project spread across several countries
16
Constraint
• RENATER organization• 30 people in RENATER (6 technical people
dedicated to related network “at large”)• Management of a national backbone is a huge
task• Needs of 24hx24 / 7x7 duty coverage• Needs of huge Network Management
Infrastructure• At minimum, more than 15~20 people are
needed to run such network
17
Constraint
• Outsourced NOC• 10 dedicated staff
• With 24hx24 / 7x7 duty coverage• Guaranty to have “skilled” staff
• Mutualized Network Management• NMS already deployed (Pollers, SNMP trap servers)• Web portal, Trouble ticket system• Etc.
18
Constraint
• But …“On est jamais mieux servi que par soi même”
• NOC needs to be closely followed• NMS not always accurate• Sometimes NOC perspective is different than our perspective of the
network• Staff renewal
• After all, it is still needed to deploy our own Network Management System
• Control/Check the network behavior in details• Provide detailed report on network usage• Ensure light NOC function• A full automated network management suite is not required
• In house / Home made tools
19
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
20
Data sources
• Passive measurements• SNMP• Netflow• (Command line interface)• (XML)
• Active measurement• Home made beacons• RIPE TTM• Symetricomm RENATER’s choice
21
Passive measurements: SNMP
• MIB : Tree organized database residing insidethe equipment that can be interrogated throughthe use of Simple Network Management Protocol (SNMP)
• Opensource software available: MRTG, Cricket, RTG, CACTI...
• For HACKERS, C library: UCD-SNMP(now Net-SNMP)
• Different SNMP version:• v1 : Get, GetNext, Set• v2 : Get, GetNext, GetBulk, Set, Inform• v3 : Added security features and administration
22
Passive measurements: SNMP
• Interfaces load:• In Mb/s or Packet/s• Different available reports:
• 24 h graph report• Weekly• Half a year report• yearly
• CPU load:
23
Passive measurements: SNMP
• MIB v6• MIB MPLS • SNMP collection mode limits:
• Graph report are averaged (Average over 5 minutes)• Provide only IP report (IPv4 + IPv6)• No information above layer 4• Difficulty to define alarms upon a traffic pattern behaviour:
25
NetFlow
• Flow definition:
Flow charateristics:- Source IP
- Destination IP
- Source port
- Destination port- Protocol
- Type of Service
- interface SNMP index
+
Number of packet of the flow
Number of bytes of the flow
Time: start and stop of the flow
Outgoing interface SNMP index
Source and Destination AS
Source and Destination subnetmask
Cummulative TCP flags
Netflow collector
User desktopMail server
26
TCP/IP Headers
0
Source IP Address
Identification
3115 16
Destination IP Address
Source Port Number Destination Port NumberSequence Number
Time to Live
Total Lengthflags Fragment Offset
Header Checksum
Version HLEN ToS
Urgent Pointer
Protocol
Acknowledgement NumberHeader Reserved Window SizeTCP Flags
TCP Checksum
IP Header
TCP Header
27
NetFlow
• Architecture example:
• Some figures :• A netflow traffic rate estimlated at 60 Mb/s during the day,
3Mb/s in the night toward the netflow collector• 20 millions of flux / 5 minutes during business hour, a
minimum of 2 millions over night (Pay attention, 80% of the equipments are running netflow in sample mode)
28
NetFlow• Web server connection example:
• backup of information of interest related to:• A router• An interface• An Autonomous System• A network prefix (Ex: Traffic rate related to customer netbloc)• Some well known ports
Adresse sourceAdresse destination Routeur
index d'entrée
index de sortie
Port source
Port dest. Prot Octets Paquets
AS source
AS dest.
193.49.159.141 194.199.8.10 193.51.179.66 16 14 1164 80 6 821 10 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1164 6 14456 14 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1165 80 6 544 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1165 6 6582 7 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1166 80 6 552 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1166 6 6641 7 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1167 80 6 551 7 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1167 6 6895 8 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1168 80 6 755 12 0 65037194.199.8.10 193.49.159.141 193.51.179.66 14 16 80 1168 6 20203 17 65037 0193.49.159.141 194.199.8.10 193.51.179.66 16 14 1169 80 6 460 5 0 65037
29
NetFlow (prefix information)
NetFlow information contained in the database are available through a html/php interface, here are some screenshots:
Traffic corresponding to a specific IP address block can be visualized (either in number of flows per second – see above example – or per bit/s – see on the left example -).There are about 4000 IP addresses blocks for the RENATER network (such addresses being of course allocated to RENATER sites).
NR = nœud RENATER
30
Netflow information per customer
Volume of traffic by class of services (IPv4)
Daily and monthly IPv4 traffic volume
31
NetFlow• Distribution of traffic by ports
• Alarm based on routing issue
• Traffic matrix
• Flow capture based on specific traffic pattern or signature at the collectorlevel (ports, number of paquets, sizes,…). A report is generated everydaycorresponding to the top 40 of addresses having suspected"suspected« traffic (P2P, ftp Warez). These reports are transferred to the CERT-RENATER so as to handle security issue.
32
NetFlow
Protocoles
icmp2%
tcp84%
udp14%
others0%
icmptcpudpothers
Distribution by protocol
Répartition des flux et des octets pour le trafic du NR de Lyon
0 10 20 30 40 50
0
20
22
25
53
80
137
445
4662
5020
autres
OctetsFlux
Distribution by port
33
NetFlow
• DoS detection :• When threshold is crossed:
• NetFlow report extract:
Adresse sourceAdresse destination Routeur
index d'entrée
index de sortie
Port source
Port dest. Prot Octets Paquets
AS source
AS dest.
194.57.222.66 163.15.163.247 193.51.177.42 5 3 1445 80 6 40 1 1715 7539194.57.222.211 163.15.163.247 193.51.177.42 5 3 1414 63 6 40 1 1715 7539194.57.222.170 163.15.163.247 193.51.177.42 5 3 1191 53 6 40 1 1715 7539194.57.222.190 163.15.163.247 193.51.177.42 5 3 1232 34 6 40 1 1715 7539194.57.222.18 163.15.163.247 193.51.177.42 5 3 1610 25 6 40 1 1715 7539194.57.222.10 163.15.163.247 193.51.177.42 5 3 1582 23 6 40 1 1715 7539194.57.222.139 163.15.163.247 193.51.177.42 5 3 1103 1 6 40 1 1715 7539194.57.222.203 163.15.163.247 193.51.177.42 5 3 1590 116 6 40 1 1715 7539194.57.222.116 163.15.163.247 193.51.177.42 5 3 1877 113 6 40 1 1715 7539194.57.222.34 163.15.163.247 193.51.177.42 5 3 1566 98 6 40 1 1715 7539194.57.222.166 163.15.163.247 193.51.177.42 5 3 1122 90 6 40 1 1715 7539194.57.222.1 163.15.163.247 193.51.177.42 5 3 1975 86 6 40 1 1715 7539194.57.222.182 163.15.163.247 193.51.177.42 5 3 1248 82 6 40 1 1715 7539194.57.222.163 163.15.163.247 193.51.177.42 5 3 1696 70 6 40 1 1715 7539194.57.222.6 163.15.163.247 193.51.177.42 5 3 1270 62 6 40 1 1715 7539
34
• Additional problem :• The more the traffic gets higher Netflow
processing at equipment level becomes a problem
• One answer:Sampling (Sampled Netflow):
Within RENATER: 10% packets (~1/2 des flux)
NetFlow
35
NetFlow
• New transport formats (v9): • Use of "templates"• New protocol taken into account
• IPv6• Multicast • MPLS
• Netflow Egress vs Ingress• Different sampling mode :
• Determinist• Random
36
NetFlow sampling mode comparison« Full » et « Sampled »
90%530355248802075Amount of packets
90%207819025019344774677Amount of bytes
55%15223383416043Number of flows
LossSampledFull
54%14875223204712Number of TCP flows
84%34713211089Number of UDP flows
58%103241Number of ICMP flows
55%15223383416043Number of flows
PertesSampledFull
The attack can be visualized among all the flows that go through RENATER backbone. The number of flows increases at 12:30. It's not an usual traffic behaviour.
It seems that the attack comes from a source IP address that do not belong to a RENATER site. However, the flows are showing this because the origin IP address has been spoofed (usurpation of IP address). IP source address is spoofed but destination IP address is real and this destination is located in an ISP network.
Information about such flows can be found using the tool logs :
- destination address is always the same.
- source address is different (but in the same block)
- router IP address
- different source port
- same destination port
DoS attack
These traces come from NetFlow tool.
source addressdestination address Router
index IN
index OUT
source port
dest. Port Prot
171.24.11.213 217.172.184.27 193.51.177.35 5 2 1142 8767 6195.110.78.31 217.172.184.27 193.51.177.35 5 2 1885 8767 620.117.44.79 217.172.184.27 193.51.177.35 5 2 1185 8767 6202.140.234.35 217.172.184.27 193.51.177.35 5 2 1108 8767 6142.242.37.16 217.172.184.27 193.51.177.35 5 2 1784 8767 6131.128.177.4 217.172.184.27 193.51.177.35 5 2 1966 8767 661.30.170.221 217.172.184.27 193.51.177.35 5 2 1715 8767 6219.218.36.159 217.172.184.27 193.51.177.35 5 2 1746 8767 631.129.210.2 217.172.184.27 193.51.177.35 5 2 1672 8767 623.124.245.196 217.172.184.27 193.51.177.35 5 2 1960 8767 6106.250.168.39 217.172.184.27 193.51.177.35 5 2 1285 8767 6181.25.228.4 217.172.184.27 193.51.177.35 5 2 1058 8767 6159.225.242.122 217.172.184.27 193.51.177.35 5 2 1274 8767 6167.166.50.239 217.172.184.27 193.51.177.35 5 2 1809 8767 62.104.106.121 217.172.184.27 193.51.177.35 5 2 1729 8767 682.210.101.233 217.172.184.27 193.51.177.35 5 2 1162 8767 6203.20.11.217 217.172.184.27 193.51.177.35 5 2 1397 8767 6210.153.129.121 217.172.184.27 193.51.177.35 5 2 1632 8767 6220.206.215.31 217.172.184.27 193.51.177.35 5 2 1644 8767 6182.20.179.145 217.172.184.27 193.51.177.35 5 2 1079 8767 6183.48.240.70 217.172.184.27 193.51.177.35 5 2 1920 8767 6189.192.51.92 217.172.184.27 193.51.177.35 5 2 1493 8767 6151.16.75.43 217.172.184.27 193.51.177.35 5 2 1573 8767 6
Deny of service attack traces and investigation:
39
LimogesPOP
SfinxPOP
BACKBONEBACKBONE
CollectNetwork
ISP
• This attack can also be seen through a SNMP tool :- increasing of traffic : + 40 Mbit/s - on each network interface that the flows went
through- but we don't have flow level view, and so no
access to IP address.- After detection, information are sent to RENATER
Computer Emergency Response Team (CERT)
12
3
NetFlow router of previous slide
2:
3:
1:
DoS observation with NetFlow and SNMP
40
Active measurements
• Active measurement agenda• Basic concepts• Metrics• Application requirements• Issues related to active measurements• Active measurement within a high speed backbone• Some existing active measurements alternatives
41
Basic concept
• Various delay definition : • Propagation delay
• Time for the signal to propagate through the physical media• Function of distance and light celerity, 0.1-0.2 second round the
globe• Transmission delay
• Link rate / Size of the paquet• Queuing delay
• Number of packets * link rate / paquet size
• Packet losses due to : • Transmit queue is full• Link noise (inexistant today except for wireless technology) • Re-routing too long (Cf case study #6)
42
Metrics (1/2)
• IETF standardization :• IPPM (IP Performance Groups)• Delay, Jitter, Packets, unsequence re-ordering...
• For each class of service and IP protocoltype
• Focus also put on time precision• End to end measurement but also
segmented measurement:
43
Metrics (2/2)• Unidirectional Delay (cf RFC 2679) :
• Significant for real-time application (VoIP, …)• Quantify the quality perception of the user (ex : TCP quality application depends on the
delay of the packets arrival and not on TCP acknowledgement) → More representative thanthe RTT
• Can be influenced by an efficient class of service implementation taking into account direct and return path separatly
• Jitter (Unidirectinoal delay variation, cf RFC 3393) :• Mostly induced by equipement transmit queues• Tune buffer size correctly for streaming application• Denote the dynamic and stability of the network
• Unidirectionnel packet loss (cf RFC 2680) :• Significant for all type of application• Mosty due to network congestion
• Reordeing (cf draft-ietf-ippm-reordering) :• 1 paquet non ordered = paquet with delay (cf IPPM) • Significant for streming application (paquet with too much delay packet dropped)• Mostly due to ECMP or non ECMP in the network, or protocol retransmission upon error
raised
44
Application requirements:
Source : http://www.itu.int/osg/spu/wtpf/wtpf2001/infosession/pettitt1.pdf
45
Issues
• Precision• Delay ≈ Few ms
Precision ≈ 100 μs
• Synchronization :• NTP :
• WAN > 1 ms• Instability
• GPS : • 10 µs• Not cost effective due to installation cost
• End to end measurement :• One beacon on each end• Troubleshooting issue with end to end PC
Unidirectional delay between 2 stations A and B directly connected, B isNTP server for A, B synchronize its clock on locale hardware clock.
46
Issues : OS
• Measurement station : real time and precise supervision → Hardware stability ≈ 100 μsCommon OS are not real time OS Latency could be 10s ms long
Example : Delay between 2 computers connected by a cable
47
Issues : OS (2)
Answers :
• Use a real time OS (QNX, RTLinux, …) Huge development effort
• Apply a patch to improve OS (lowlatency or preempt kernel for linux for instance) → simple but less efficient
• Post-process the data result so as to compensate the imprecision due to the OS Not easy task …
48
Issues : other criteria
• Location : Ideally a beacon in each POP would be optimal so as to test all combination of end to end path
• Measurement coordination and centralization for post processing
• Reliability : do not trigger false alarm• Security : Avoid measurement falsification, intrusion and
DOS• Representative measurements : Tune test so as tobe in
user conditions Qualify the customer caracteristicsand apply it to test traffic (size, DSCP …)
• Exhaustivity : Multicast and IPv6 measurement
49
Some figures in RENATER backbone
• Delay ≈ few ms• Jitter ≈ ms• Loss << 10-3• Re-ordering very low
6
4,19,05
33,4
9,5
11,8
11,5
7,76,8
12,35
6,6110,02
50
Active measurements@ RENATER
• Several boxes placedon strategical PoPs
• GPS syncronization• ~microsec accuracy
• IPPM metricssupported
54
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
55
RENATER management
• Public tools• Network health:
• WeatherMap• Active measurements
• Looking Glass• Private tools
• Traffic per Site (Netflow)• IGP consistency
64
Déploiement de sondes de mesures actives sur RENATER
http://pasillo.renater.fr/metrologie/get_qosmetrics_results.php
65
Looking Glass
• Get information on a router w/o directconnection
• Web Interface• Final user don’t need a login• Allows the user to detect causes of
failures w/o asking the NOC or netadmin
67
Traffic per site
• Sites aren’t directly connected to RENATER PoPs• Netflow technology permits to have traffic per
site• Interaction between Information System
(SAGA)• Demo
70
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
72
Case study #2
• Get the next 5 minutes after the peaks
• Compute the average• You get the ususal bit
bit rate• One polling took more
time and a significantamount of traffic has been taken into accountin the previous 5 min
• The next value over 5 min has in turn a lower value
73
Case study #3 (start)
• Architecture :
4 x E1 (Toward Metropolitan France)
POP CAYENNE (GUYANE CENTRAL US)French Overseas territory with IPSLA probes
activated
76
Case study #6 re-routing
• Delay:
• Jitter:
• Hop count:
• Packet loss:• 1 Pkt/10ms• 400 Pkt drops 4s
78
Case study #8
• Link DOWN• Rerouting at layer 3
• Alternate path longer• Hop count greater• Delay shorter
85
Agenda
• Few words about RENATER• Basic network management concepts• Approach based on several constraints• Data sources• Network Management Tools• Case studies• Hands-on on tools
86
Hands on tools
• yaNMP• Yet Another Network Management Platform• Provide a big picture if your network• Fault management and capacity planning• Can be coupled with any data source• Simple way to show up how your network
looks like and keep track of its evolution
87
yaNMP
• yaNMP context• RENATER-5 deployment
• New platform Different configuration type• Different type of equipment• New OS IOS, IOS-XR, IOS-XE• Maybe one day JUNOS ? ☺• Links upgrade and new links Topology change
• Help !!!!• How to reflect the Network status on a day to day basis ?• How to keep up with the new deployment pace ?• Using existing weathermap is possible but the process is not
so intuitive
88
yaNMP
• yaNMP intrinsics• yaNMP-GUI
• Java based GUI• “Should” be multi-platform • Start yaNMP in interactive mode !
• yaNMP-DAEMON• Java based• “Should” be multi-platform• Start yaNMP in NON interactive mode !
89
yaNMP
• yaNMP input files• Nodes file
• Node identifier• X position• Y position
• Links file• Half link identifier• Link value
• Links status URL
90
yaNMP
• Hands-on• Hands-on objective: depict your network
• Provide a physical view of your network• Provide a logical view of network• Reflect your network state• Make your weathermap reflect the evolution of your network
• Context• Each NREN will have its own geographical map background• Use it to create your physical network• Build then your Layer 3 weathermap
• Scenario• Your engineering team has deployed a new link between 2 of your
existing POP• Adjust your physical weathermap• Adjust your logical weathermap