open source tools for optimizing your peering …...software / network engineer at mauve mailorder...

Post on 05-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open source tools for optimizing

your peering infrastructure

@ DE-CIX TechMeeting 2018-06-06

by Daniel Czerwonk

• Software / Network Engineer at Mauve Mailorder Software

• Head of Network Freifunk Essen e.V.

• AS44821 (Mauve), AS206356 (Freifunk Essen e.V.),

AS202739 (routing-rocks)

• birdwatcher and bio-routing contributor

• Twitter: @dan_nrw

• Github: https://github.com/czerwonk

• LinkedIn: https://www.linkedin.com/in/czerwonk/

Who is this guy? About me…

Our journey starts late 2016

A new networking setup is about to

be build

But before that:

Let’s talk about monitoring…

• Very small operations team

• Freifunk Essen should be even less ops demanding

• Identify trends/anomalies early

• Capacity planing (beware of retention)

• Source for alerting

• Start point for traffic engineering, etc.

• Source to build post mortem on (in case of outage)

• Dashboard to give a quick overview when needed

Why is monitoring important for me?

So, let’s build a monitoring system…

• Prometheus to collect metrics

• Grafana to visualize metrics

• Alertmanager with Pushover integration for alerting

• Everything Ansible managed

What I wanted…

+ +

• Bird routing daemon

• JunOS running on a few EX series switches

• Host metrics from bare metal software router machines (statistics, resources)

• External network latencies (RIPE ATLAS, etc.)

What I wanted to scrape?

What I found…

In 2016…

Metric Solution Problem

bird no exporter available

JunOS snmp_exportercomplex configuration,

bad performance

Host metrics node_exporter

Network latenciesblackbox_exporter with

external probe VMs

bad coverage,

only one request per scrape

• Official Prometheus project

• On Linux hosts (e.g. Routers)

• Network interface metrics

• Resource consumption: CPU load, RAM usage, Disk space

• Interrupts / context switches

• License: Apache 2.0

• Source: https://github.com/prometheus/node_exporter

node_exporter

At least we got the host metrics covered.

And the rest?

I had to solve that…

So I started to write some

exporters…

• Performance is key feature

• Need for concurrent processing

• Single binary / no dependencies

• Easy installation via go get …

• Existing client API for Prometheus

• Love writing code in golang in my spare time

Which programming language?

I chose golang:

atlas_exporter

RIPE ATLAS

Milestones to an exporter suite

bird_exporter

Bird 1.x

2016 20182017

RIPE LABS

article

Support for

bird 2.x

Replaced SNMP

by SSH

junos_exporter

Juniper JunOS

using SNMP

ping_exporter

ICMP probing

mikrotik-exporter

RouterOS

• Started late 2016

• Communicates with bird via socket

• Bird 1.x and 2.x supported

• Protocols: BGP, OSPFv2, OSPFv3, Kernel, Static, Device, Direct

• License: MIT

• Source: https://github.com/czerwonk/bird_exporter

bird_exporter

bird_exporter

bird_protocol_prefix_import_count{proto=~"BGP|OSPFv3",ip_version="6"}

count(bird_protocol_up{proto=“BGP"} == 1)

• BGP session state metrics

• BGP message counts (received, sent, withdrawn, etc.)

• Prefix counts for all supported protocols (imported, exported, filtered, etc.)

• OSPFv2/OSPFv3 neighbour counts

• Protocol uptime

bird_exporter - Features

• Started early 2018

• Replacement for RRD based smokeping

• Concerning ICMP also replacement for blackbox_exporter since lack of loss

detection

• Based on go-ping by Digineo: https://github.com/digineo/go-ping

• License: MIT

• Source: https://github.com/czerwonk/ping_exporter

ping_exporter

ping_exporter

ping_rtt_mean_ms{ip_version="6"}

ping_loss_percent{ip_version="4"}

• Sends and aggregates multiple ICMP ECHO requests

• Roundtrip metrics (current, best, worst)

• Simple way to detect loss

• Supports multiple targets

• DNS refresh ensures the correct IP is measured when DNS is changed

• Only ICMP support at the moment

• Warning: ICMP is not user traffic so keep that in mind when trying to interpret these

metrics

ping_exporter - Features

• Started early 2017

• Metrics by requesting measurement results from RIPE ATLAS

• Useful to get an outside view from different other networks

• License: LGPL3 (since the binding used is under this license)

• Source: https://github.com/czerwonk/atlas_exporter

• More info:

https://labs.ripe.net/Members/daniel_czerwonk/using-ripe-atlas-measurement-

results-in-prometheus-with-atlas_exporter

atlas_exporter

atlas_exporter

avg(atlas_ping_avg_latency{ip_version="4"}) by (asn)

avg(atlas_traceroute_hops{ip_version="4"}) by (asn)

• Ping (success, min/max/avg latency, dups, size)

• Traceroute (success, hop count, rtt)

• NTP (delay, derivation, ntp version)

• DNS (succress, rtt)

• HTTP (return code, rtt, http version, header size, body size)

• SSL Certificates (alert, rtt)

atlas_exporter - Features

• Started late 2017

• snmp_exporter did not perform as required

• First implementation using a simple set of SNMP OIDs

• Early 2018: reimplementation using SSH and XML RPC representation

• Alternative to Junipers OpenNTI since telemetry is only supported on newer

versions of JunOS and hardware

• License: MIT

• Source: https://github.com/czerwonk/junos_exporter

junos_exporter

• Interfaces (bytes transmitted/received, errors, drops)

• Routes (per table, by protocol)

• Alarms (count)

• BGP (message count, prefix counts per peer, session state)

• OSPFv2, OSPFv3 (number of neighbours)

• Interface diagnostics (optical signals)

• ISIS (number of adjacencies, total number of routers)

• Environment (temperatures)

• Routing engine statistics

junos_exporter - Features

• Contribution to existing project

• Only interface and resource metrics at this point

• Added several other features

• License: BSD3

• Source: https://github.com/nshttpd/mikrotik-exporter

mikrotik-exporter

• Interface metrics (RX bytes, TX bytes, drops, errors, etc.)

• BGP session states

• BGP message counts (updates, withdraws)

• DHCP leases

• DHCPv6 bindings

• Optical diagnostics

• IPv4/IPv6 pool counts

• System resources (memory, CPU load, etc.)

• Prefix counts per protocol (in RIB)

mikrotik-exporter - Features

Dashboard examples

How to combine several exporters?

Mauve Network Overview

Mauve Routing

Alerting

When and how?

How to alert?

What the SRE book has taught us:

https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html

How to alert? A few examples…

Port saturation:

Upstream session down:

Thank you for your attention.

Special thanks to all people contributed to my projects!

top related