monitoring mysql at scale

47
Monitoring MySQL at SCALE

Upload: ovais-tariq

Post on 07-Apr-2017

668 views

Category:

Technology


0 download

TRANSCRIPT

Monitoring MySQL at SCALE

Who We Are

Ilan RabinovitchDir. Technical Community

Datadog

Ovais TariqStorage SRE

Uber(formerly at Lithium & Percona)

Agenda

1. About Lithium and MySQL2. Background: Monitoring Challenges in a Dynamic World3. Theory: Monitoring 1014. Practical: Triaging a Real Incident at Lithium

About Lithium Technologies

Lithium’s platform helps brands connect, engage and understand their customers

MySQL Architecture / Data Flow

•Multi-Tenant SaaS applications•Typical Master-slave replication setup•MySQL running

○ On bare metal○ In AWS public cloud○ In OpenStack

CultureAutomationMetricsSharing

Damon Edwards and John WillisDevOps Day LA

CultureAutomationMetricsSharing

Damon Edwards and John WillisDevOps Day LA

You’re in the cloud and it's everything you dreamed of!

Autoscaling Infinite StorageManaged Databases

Container Orchestration

Private Clouds

Collecting data is cheap;not having it when you need it can be expensive

Instrument all the things!

Operational Complexity Increases with..

• Number of things to measure• Velocity of change

How much we measure?1 instance

• 10 metrics from CloudWatch1 operating system (e.g., Linux)

• 100 metricsMySQL Instance

• 350~ metrics

460metrics per host

46,000100

instances

•Earlier - typical Nagios and Cacti setup•Static config and lack of context•No correlation between alerts and graphs•No self-service for developers•In-house tooling has high cost

When to let a sleeping engineer lie?

Recurse until you find root cause

• Query Time• Queries Per Second

Data Sources • Performance Schema• MySQL Status Variables

• Query Time• Queries Per Second

Sources:• Performance Schema

• Disk Space Usage• Threads_connected• Threads_running• Connection_errors_ internal• Aborted_connects• Connection_errors_ max_connections

Sources:● Server Status Variables

• Configuration Change• Code Deployment• Service Started / Stopped• MySQL Upgrades • Failovers• etc

Change in workload without an increase in workload affected the schema ‘groupecasino’

• Workload characteristics change to make it more CPU bound• No increase in IO activity• Increase in number of read operations• No change in types of read operations• Similar number of range queries reading more rows

Monitoring 101: Alerting https://www.datadoghq.com/blog/monitoring-101-alerting/

Monitoring 101: Collecting the Right Datahttps://www.datadoghq.com/blog/monitoring-101-collecting-data/

Monitoring 101: Investigating performance issues https://www.datadoghq.com/blog/monitoring-101-investigation/

Monitoring MySQL Performance Metricshttps://www.datadoghq.com/blog/monitoring-mysql-performance-metrics/

Collecting MySQL Metricshttps://www.datadoghq.com/blog/collecting-mysql-statistics-and-metrics/