monitoring mysql replication lag with prometheus & pt-heartbeat

Post on 28-Jan-2018

800 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Monitoring MySQL Replication Delaywith mysqld_exporter & pt-heartbeat

Julien Pivotto (@roidelapluie)

PromConf Munich

Augustus 18, 2017

SELECT USER();Julien "roidelapluie" Pivotto

@roidelapluie

Sysadmin at inuits

Automation, monitoring, HA

MySQL/MariaDB user/admin/contributor

Grafana and Prometheus user/contributor

inuits

MySQL ReplicationMySQL Master <-> MySQL Master

MySQL Master -> MySQL Slave

MySQL Master -> MySQL Slave -> MySQLSlave

MySQL Masters -> MySQL Slaves -> MySQLSlaves -> MySQL Slaves

MySQL Master -> MySQL Slaves

mysqld_exporter

mysqld_exporter

mysqld_exporter is greatLots of data

Lots of alerts examples

Percona's Graphana dashboard brings dozensof useful dashboards

Migrating to Prometheus does not mean that weshould forget the past ... Or lower our monitoringexpectations.

pt-heartbeatpt-heartbeart is a daemon that updates an entrywith current timestamp on a mysql server everysecond.

On the replica, you can check the timestamp anddo  NOW ­ timestamp  to get the real lag.

+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| ts                         | server_id |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| 2017­08­17T16:55:01.001030 |         1 |+­­­­­­­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+

pt-heartbeatGPL

Perl

Part of percona toolkit

pt-heartbeatOur previous monitoring tool (munin) had supportfor pt-heartbeat. Prometheus mysqld_exporterdidn't.

wait, mysql has that nativelymysql> SHOW SLAVE STATUS\G...Seconds_Behind_Master: 0...

aka mysqld_exporter metric:

 mysql_slave_lag_seconds 

BugsFixes for Seconds_Behind_Master in: 5.7.18,5.6.36, 5.6.23, 5.6.16.

pt-heartbeat is usefulOkay, so we had that thing, now we move toprometheus, we don't want to losethat thing.

:idea_emoji: let's implement this!

Pull Request 183https://github.com/prometheus/mysqld_exporter/pull/183

Opened Feb 20

Merged Feb 21

How it worksChecks the heartbeat table (SQL query). It's notcalling the  pt­heartbeat  cli. So it is independantfrom it.

CLI flagscollect.heartbeat

collect.heartbeat.database

collect.heartbeat.table

Metricsmysql_heartbeat_stored_timestamp_seconds{server_id="1"}mysql_heartbeat_now_timestamp_seconds{server_id="1"}

Recording Lagmysql_heartbeat_lag_seconds =    mysql_heartbeat_now_timestamp_seconds ­    mysql_heartbeat_stored_timestamp_seconds

https://github.com/prometheus/mysqld_exporter/blob/master/example.rules

AlertALERT MySQLReplicationLag  IF      (mysql_heartbeat_lag_seconds > 30)    AND on (instance)      (predict_linear(mysql_heartbeat_lag_seconds[5m],       60*2) > 0)  FOR 1m  LABELS {    severity = "critical"  }  ANNOTATIONS {    summary = "MySQL slave replication is lagging",    description = "The mysql slave replication has      fallen behind and is not recovering",  }

https://github.com/prometheus/mysqld_exporter/blob/master/example.rules

Contributing to PerconaGrafana Dashboards

less great

PR opened Feb 23

Still open

Takeawayscontributing to prometheus is easy

pt-heartbeat is the way to monitor mysqlreplication lag

and now it's available in prometheus

any volunteers to rewrite pt-heartbeat in go? :)

Julien Pivottoroidelapluie

roidelapluie@inuits.eu

Inuitshttps://inuits.euinfo@inuits.eu

Contact

top related