mysql performance monitoring

Post on 11-May-2015

1.552 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

MySQL Performance Monitoring using Statsd and Graphite

TRANSCRIPT

MySQL Performance monitoring using Statsd and Graphite Art van Scheppingen Head of Database Engineering

2  

1.  Who  are  we?  2.  What  monitoring  tools  do  we  use?  3.  What  are  StatsD,  Collectd  and  Graphite?  4.  How  MySQL  logs  to  StatsD  5.  Graphing  examples  6.  Challenges  7.  QuesHons?  

Overview

Who are we? Who  is  Spil  Games?    

4  

•  Company  founded  in  2001  •  350+  employees  world  wide  •  180M+  unique  visitors  per  month  •  Over  50M  registered  users  •  45  portals  in  19  languages  

•  Casual  games  •  Social  games  •  Real  Hme  mulHplayer  games  •  Mobile  games  

•  35+  MySQL  clusters  •  60k  queries  per  second  (3.5  billion  qpd)  

Facts

5  

Geographic Reach 180  Million  Monthly  AcHve  Users(*)  

Source:  (*)  Google  Analy3cs,  August  2012    

6  

Girls,  Teens  and  Family    

spielen.com  juegos.com  gamesgames.com  games.co.uk  

Brands

Monitoring

We  use(d)  many  many  many  monitoring  tools  so  far!    

8  

•  Opsview/Nagios  (mainly  availability)  •  CacH  (using  Baron  Schwartz/Percona  templates)  •  MONYog  •  Good  ol’  RRD  

Existing monitoring systems we use(d)

9  

Opsview/Nagios

•  Strong  points:  •  Easy  to  create  (nagios)  plugins  •  Slaves  for  scaling  out  

•  Weak  points:  •  Stats  gathering  through  polling  •  Low  granularity  (1  to  5  minutes)  •  Difficult  URIs  for  graphs  

10  

Cacti

•  Strong  points:  •  Awesome  Percona  templates  •  Great  overviews  and  graphs  

•  Weak  points:  •  Hard  to  add  new  metrics  (to  90+  servers)  •  Not  scalable  •  Low  granularity  (1  to  5  minutes)  •  Hard  to  correlate  

11  

MonYOG

•  Strong  points:  •  Easy  to  set  up  •  Compare  any  server  with  another  •  Compare  configuraHons  

•  Weak  points:  •  “Closed  source”  •  Not  scalable  •  Jack  of  all  trades  

12  

Poll limitations

•  Limited  to  a  set  interval  •  Data  gets  averaged  out  •  (Host)  checks  are  run  serial  •  Slowdowns  in  a  run  means  no/less  data  •  Scaling:  add  more  masters/slaves  •  Sekng  up  an  SSH  connecHon  is  slow  

13  

Difficult to add a new metric host065!bash-3.2# netstat -s | grep "listen queue"!    26 times the listen queue of a socket overflowed!!host066!bash-3.2# netstat -s | grep "listen queue"!    33 times the listen queue of a socket overflowed!

14  

Other things you can’t do!

Statsd + Collectd + Graphite What  are  they?    

16  

•  Highly  scalable  real-­‐Hme  graphing  system  •  Collects  numeric  Hme-­‐series  •  Backend  daemon  Carbon  

•  Carbon-­‐cache:  receives  data  •  Carbon-­‐aggregator:  aggregates  data  •  Carbon-­‐relay:  replicaHon  and  sharding    

•  RRD  or  Whisper  database  

What is Graphite?

17  

•  Each  metric  is  in  its  own  bucket  •  Periods  make  folders  •  prod.syseng.mmm.<hostname>.admin_offline  

•  Metric  types  •  Counters  •  Gauge  

•  RetenHon  can  be  set  using  a  regex  •  [mysql]    •  pasern  =  ^prod\.syseng\.mysql\..*$    •  retenHons  =  2s:1d,1m:3d,5m:7d,1h:5y  

Graphite’s capabilities

18  

•  Unix  daemon  that  gathers  system  staHsHcs  •  Over  90  (input/output)  plugins  •  Plugin  to  send  metrics  to  Graphite/Carbon  •  Very  useful  for  system  metrics  

What is Collectd?

19  

•  Front-­‐end  proxy  for  Graphite/Carbon  (by  Etsy)  •  NodeJS  daemon  (also  other  languages)  •  Receives  UDP  (on  localhost)  •  Buffers  metrics  locally  •  Flushes  periodically  data  to  Graphite/Carbon  (TCP)  •  Client  libraries  available  in  about  any  language  •  Send  any  metric  you  like!  

What is StatsD?

20  

•  StatsD  funcHons  •  update_stats  •  increment/decrement  •  set  •  gauge  •  Hmers  

StatsD functions

21  

PHP:  $statsd = new StatsD();!$statsd->increment(“prod.app1.pages_rendered”, 1);!$statsd->gauge(“prod.app1.page_concurrency”, 10);!$statsd->set(“prod.app1.unique_users”, $userid);!…!$start = microtime(true); !serve_out_content_to_clients(); !$statsd->timing(”prod.app1.rendering_time", (microtime(true) - $start) * 1000);!!Library:!https://github.com/etsy/statsd/blob/master/examples/php-example.php!!

StatsD PHP code examples

22  

Our Graphite cluster(s)

Client  requesHng  graphs  

Graphite  Rendering  Cluster   Carbon  relay  

Loadbalancer  (port  443)  

DEV   SYSENG   SERVICES1   SERVICES2  

Server-­‐1   Server-­‐2   Server-­‐n  

Loadbalancer  (port  2003)  

8 nodes

3 nodes 2 nodes

23  

Graphite Storage Clusters

24  

Collectd

Collectd  

Gather  data  plugins  

CPU   DISK   LOAD   ….  

Carbon  TCP  

30 second interval

25  

StatsD

StatsD  

ApplicaHon  Level  

#  OF  LOGINS   CACHE  HIT/MISS   STATUS   INNODB  STATUS  

Carbon  TCP  

2 second interval

MySQL_Statsd  

localhost:8125 UDP

26  

Global scale?

MySQL + StatsD

How  do  we  use  them?    

28  

•  MySQL  plugin  for  Collectd  •  Sends  SHOW  STATUS  •  No  INNODB  STATUS  •  Plugin  not  flexible  

•  DBI  plugin  for  Collectd  •  Metrics  based  on  columns  

•  Different  granularity  needed  •  Separate  daemon  (with  persistent  connecHon)  •  StatsD  is  easy  as  ABC  

Why use StatsD over Collectd?

29  

•  Wrisen  in  Python  •  Gathers  data  every  0.5  seconds  •  Sends  to  StatsD  (localhost)  a�er  every  run  •  Easy  to  set  up:  no  configuraHon  •  Persistent  connecHon  •  Baron  Schwartz’  InnoDB  status  parser  (cacH  poller)  •  Other  interesHng  metrics  and  counters  

•  InformaHon  Schema  •  MySQL  5.5/5.6  Performance  Schema  •  MariaDB  specific  •  Galera  specific  

MySQL StatsD daemon

30  

MySQL StatsD overview

MySQLCollector

SHOW STATUS

SHOW INNODB STATUS

SHOW VARIABLES

Persistentconnection

StatsD

Flushedevery

0.5 seconds

31  

•  Perl  (Net::Statsd)  •  Sends  any  status  change  to  StatsD  (localhost)  •  Non-­‐blocking  (thanks  to  UDP)  •  Draw  as  infinite  in  Graphite  

MySQL Multi Master patch

32  

use Net::Statsd;!$Net::Statsd::HOST = 'localhost'; # Default!$Net::Statsd::PORT = 8125; # Default!!…!!# ONLINE -> HARD_OFFLINE!unless ($ping && $mysql) {! Net::Statsd::update_stats('prod.syseng.mmm.'.$host.'.hard_offline', 1);! FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));! $agent->state('HARD_OFFLINE');!}!!…!!

MMM Perl code example

33  

•  Deployments  •  User  iniHated  acHons  

•  Logins  •  High  scores  •  Comments  /  raHngs  •  Images  uploaded  •  Payments  

•  ApplicaHon  metrics  •  Error  counts  •  Cache  staHsHcs  (cache  hit/miss)  •  Request  Hmers  •  Image  sizes  

Other metrics

Start graphing! Now  it  starts  to  get  interes=ng!  

35  

•  IdenHfy  your  KPIs  •  Don’t  graph  everything  

•  More  graphs  ==  less  overview  •  Combine  metrics  •  Stack  clusters  

What is important for you?

36  

•  Include  other  metrics  into  your  graphs  •  Deployments  •  Failover(s)  

•  Combine  applicaHon  metrics  with  your  database  •  Other  influences  

•  Solar  flares  •  Start  of  the  new  Maya  calendar  

Correlate!

37  

•  URI  based  rendering  API  •  Support  for  wildcards  

•  stats.prod.syseng.mysql.*.status.com_select  •  sumSeries  (stats.prod.syseng.mysql.*.status.com_select)    •  aliasByNode(stats.prod.syseng.mysql.*.status.com_select,  4)    

•  Many  funcHons  •  Nth  percenHle  •  Holt-­‐Winters  Forecast  •  Timeshi�  

Graphite Graphing Engine

38  

Graphite Aggregator syseng => {!           nodes => [”databasehost1", ”databasehost2"],!           copying_relay_instances => 8,!           hashing_relay_instances => 8,!           cache_instances => 8,!           aggregation => {!               0 => {!                   name => ”mysql",!                   pattern => '.*\.mysql\..*',!                   send_raw => 1,!               },!           }!       }!!!stats.<env>.syseng.mysql.cluster1.status.questions.all (2) = !

!sum stats.<env>.syseng.mysql.*.status.questions!!

39  

Graphite web interface

               

40  

Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!

41  

Graphite Example URL https://graphitehost/render/?width=722&height=357&_salt=1366550446.553&rightDashed=1&target=alias%28sumSeries%28stats.prod.services.profilar.request.total.count.*%29%2C%22Number%20of%20profile%20requests%22%29&target=alias%28secondYAxis%28sumSeries%28stats_counts.prod.syseng.mysql.<node1>.status.questions%2C%20stats_counts.prod.syseng.mysql.<node2).status.questions%29%29%2C%22Number%20of%20queries%20profiles%20cluster%22%29&from=00%3A00_20130415&until=23%3A59_20130421!

42  

Other examples: MMM

43  

Other examples: timeshift

44  

Other examples: multiple weeks

Challenges The  road  ahead  

46  

•  MySQL_statsd  rewrite  necessary  (not  opensource  yet)  •  No  alerHng  through  Graphite  (yet)  •  Machine  learning  •  Eternal  hunger  for  more  metrics  •  Abuse  of  the  system  

What challenges do we have?

47  

•  Persistent  connecHons  +  repeatable  read  •  History  list  skyrocketed  

•  Too  many  metrics  slows  down  graphing  •  Too  many  metrics  can  kill  a  host  

•  EstatsD  for  Erlang  

What lessons have we learned?

Questions…

49  

•  Graphite:  hsp://graphite.readthedocs.org/en/latest/  •  Collectd:  hsps://collectd.org/  •  StatsD  on  Github  by  Etsy:  hsps://github.com/etsy/statsd/wiki  •  Etsy  on  StatsD:  hsp://codeascra�.etsy.com/2011/02/15/measure-­‐anything-­‐measure-­‐everything/    

Practical links

50  

•  PresentaHon  can  be  found  at:  hsp://spil.com/perconasc2013  •  If  you  wish  to  contact  me:  art@spilgames.com  •  Don’t  forget  to  rate  my  talk!  

Thank you!

top related