connect. communicate. collaborate hades – going operational roland karch, rrze fau...

17
Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen- Nürnberg JRA1 Montpellier Meeting, October 2006

Upload: candace-barnett

Post on 04-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. Collaborate

Hades – Going Operational

Roland Karch, RRZE FAU Erlangen-Nürnberg

JRA1 Montpellier Meeting, October 2006

Page 2: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateHades Implementation Status List

• IPv6 Measurements (Up and running in more than half of the JRA1 locations)

• Multicast Measurements (Implementation)• Alerts

– Packet Loss Maps (Implemented, Deployed for X-WiN)– SNMP Traps (Server needs to be set up)– Generic Web Interface (Evaluation)

• Maintenance– To be integrated into one interface with Alerts

Page 3: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateIPv6 Measurements

• Running in:– Amsterdam (SURFnet)– Athens (GRNET)– Ljubljana (ARNES)– Paris (RENATER) (currently offline)– Prague (CESNET)– Sofia (ISTF)– Zagreb (CARNET)

• Owning a JRA1 Hades measurement box as well as an IPv6 capable network but aren‘t on the list? Contact us!

Page 4: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. Collaborate

Hades weather map (GEANT/NRENs, Geographically)

Page 5: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. Collaborate

Hades weather maps (Abstract, domain specific)

Page 6: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – Packet Loss Maps

• One map to show observed packet loss on all Hades monitored links

• Colour coding on links to show short and long outages• Currently still in development, not yet in the european

context available• Maps for other metrics under consideration, but details

about those metrics yet to be determined (see statistical analysis)

Page 7: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – SNMP traps

• Problem with data on measurement archive: age between 0 and 90 minutes

• To ensure up to date information for alerts, solutions are either:– Increase frequency of data polling (causing

management network overhead and load on the measurement point and archive)

– Do analysis on the measurement point in real time (CPU load on the measurement point only, but problem of how to deliver decentralized alerts

• Solution: Decentralized analysis, and SNMP traps for alerting

Page 8: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – SNMP traps

• Multiple potential use cases for traps– Central visualization to subscribe to all alerts in order to

create a powerful map and/or alert list with history– NOCs might subscribe for their uplinks/sensitive paths

to important locations (typically already running SNMP capable monitoring facilities)

Page 9: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – SNMP traps

• Benefits– Only causes network traffic when necessary– Real time data for analysis available on the

measurement point– SNMP MP usable?

• Drawbacks– SNMP very often filtered into user networks (web

visualisation as intermediate server might solve that)– Won’t alert when the reporting path is affected by the

network problem itself

Page 10: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – Statistics

• Higher level of statistical analysis for measurement data might help to determine a „connection footprint“ and show changes in it due to routing changes.

• Possible numbers to play with:– Line inherent delay (minimal delay that catches all, or a

high percentile of all measurement packets)– Regular IPDV (blurry zone in a plot, delta between line

inherent delay and maximum of 90 percent of the measurements)

Page 11: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – Statistics – Key values

• 11.4 ms minimal delay subtracted: „Network intrinsic delay“

• 1 µs gap: timestamp precision• Lower boundary: timer precision

00:00 01:00 02:00 03:00 04:000

5

10

15

20

25

30

35

40

One-way-delay

Delta Delay / µs

Tim e / hD

elt

a D

ela

y /

µs

Page 12: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – Statistics – Pathfinders

• First packet in every group of 5: ~7 µs longer delay

• Most probable reason: Receiver process has to be loaded into the CPU cache before processing the first packet

0 4 8 12 16 20 240

200

400

600

800

1000

1200

1400

1600

1800

Pathfinder packets

Pathfinder

No pathfinder

Delta OWD / µsN

Page 13: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateAlerts – Statistics – Path fingerprint

• Comparison of paths on different networks (hardware, lines, configuration differs)

• Both: small OWD, narrow distribution of delay

• Path 2: longer distribution tail• Path 1: reordering!

0 20 40 60 800

100

200

300

400

500

600

700

800

Delay on two network paths

Path 1 Path 2

Delta Delay / µsN

Page 14: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateMaintenance

• Most important part of „going operational“• Current status:

– Daily checking of which measurement lines are down (up to 24 hours delay) over the web visualization

– Scripts run to catch most anomalies (clock status, old data

– perfSONAR MAs are monitored externally (ISTF)

Page 15: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateMaintenance

• Evaluation of Nagios [1]• Could serve as a common platform for alert and

maintenance visualization• Provides a front end for both SNMP and scripted

surveillance

[1] http://nagios.org/

Page 16: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. CollaborateMaintenance

• Goals– Highest possible level of automation– Fixing of simple problems either fully automated (i.e.

restarting measurements) or via scripts that can be triggered on the web server

– Transparency for users

Page 17: Connect. Communicate. Collaborate Hades – Going Operational Roland Karch, RRZE FAU Erlangen-Nürnberg JRA1 Montpellier Meeting, October 2006

Connect. Communicate. Collaborate

Questions / Discussion / Want to contact us?

• Website: http://www.win-labor.dfn.de/• Email: [email protected]