mongodb monitoring become a mongodb dba - monitoring ... live... · copyright 2017 severalnines ab...

Post on 01-Dec-2019

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright 2017 Severalnines AB

MongoDB Monitoring

Art van ScheppingenSenior Support Engineer, Severalnines

Become a MongoDB DBA - Monitoring Essentials

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Monitoring and trending● Why do we collect data?● What metrics to collect from MongoDB?● Key MongoDB metrics in depth● Available MongoDB monitoring tools● How to monitor MongoDB using ClusterControl

Agenda

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Monitoring and trending

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Do you need monitoring and trending?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

There is only one person who can land a plane without instruments

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Monitoring system (i.e. Nagios)○ Checks if services are healthy ○ Sends pages

● Trending system (i.e. Cacti, Graphite, Prometheus)○ Collects metrics○ Generate graphs

Monitoring vs Trending

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Do more than just opening a connection○ Measure true status of nodes and cluster○ Test read/write○ Open essential databases and collections○ Keep an eye on the replication lag

■ Increase oplog size?○ Check the full topology

Monitoring: Availability

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Trending○ Plot trends of key (performance) metrics○ Create timelines of metrics○ Correlate various metrics○ Find problems before they arise○ Pre-emptive problem management

● Trending tools○ Granularity of sampling○ More datapoints = better

Trending: why do we need trends?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Why do we collect data?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Periodical (daily/weekly) healthchecks● Insight into all aspects of the database operations● Post mortem and proactive monitoring● Capacity planning

Why do we collect data?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Healthchecks are a pain● You want to see aggregated

data● You want to be able to drill

down to a particular host● You want to see the most

important data first and dig in later on

Healthchecks

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Ability to dig into past data● Even less than 5s of data

granularity (hardware-dependent)

● Low granularity allows you to catch the issue as it evolves - no need to wait 5 minutes for a graph to refresh

Post mortem and proactive monitoring

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Graphs based on MongoDB status metrics

● Overall status and per-node graphs

● Ability to get a timeshifted graphs - useful for comparing workload changes across the time

Insight into internals, capacity planning

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

What metrics to collect from MongoDB?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Quite similar to other database systems○ Host metrics○ Operational metrics○ Storage engine metrics○ Replication metrics○ Shard metrics

Type of metrics to collect

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Similar to most other databases● Understand the utilization of the machine● Capacity planning● Determine the type of an issue

○ I/O related?○ CPU related?○ Network related?

Host metrics: what for?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● CPU utilization (should I add more nodes to the cluster?)● Network utilization (am I running out of bandwidth?)● Ping (how badly latency affects my MongoDB cluster?)● Disk throughput and IOPS (am I within my hardware limits?)● Disk space (do I have to plan for larger disks?)● Memory utilization (do I suffer from a memory leak?)

Host metrics: what to look for?

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Similar to most other databases● Throughput of the cluster● Relate throughput to cluster performance● Determine the type of an issue

○ Request spikes?○ Write amplification related?○ Queueing?

Operational metrics

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Storage engine specific○ MMAP○ Wired Tiger○ MongoRocks

● Insight in how the engine performs● Internal congestion

Storage engine metrics

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Throughput of the replication● Durability of the oplog● Replication lag● Cluster replication acknowledgement

○ Quorum based○ At least one secondary needs to acknowledge

Replication metrics

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Eventual consistency

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Shard chunks and balancing○ Chunks per shard○ Disk usage

● Non-sharded collections○ Sharding has to be enabled on collection level○ Non-sharded collections get a primary shard assigned○ Once the primary shard is full, no writes can happen

● Connection pool (mongos)○ All queries will be sent to the primary in a shard○ Range queries will block connections of the connection pool

Sharding related metrics

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Key MongoDB metrics to know about

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Oplog: a special collection containing all transactions○ Limited in size (configurable)○ Eviction of transactions (FIFO)○ Comparable to a ringbuffer

● Used for replication○ Secondaries copy transactions from the oplog on other nodes○ Full data sync necessary once the last executed transaction has been evicted

● Replication window○ Time between first and last transaction in the oplog○ Time that allows your secondary to be offline before performing a full sync

Oplog

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLI

mongo_replica_0:PRIMARY> db.getReplicationInfo()

{

"logSizeMB" : 1895.7751951217651,

"usedMB" : 419.86,

"timeDiff" : 281419,

"timeDiffHours" : 78.17,

"tFirst" : "Fri Jul 08 2016 10:56:01 GMT+0000 (UTC)",

"tLast" : "Mon Jul 11 2016 17:06:20 GMT+0000 (UTC)",

"now" : "Mon Jul 11 2016 17:15:06 GMT+0000 (UTC)"

}

Oplog: replication window

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the ClusterControl advisor:

function getReplicationWindow(host) {

var replwindow = {};

replwindow['newset'] = false;

// Fetch the first and last record from the Oplog and take it's timestamp

var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');

replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];

if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {

replwindow['newset'] = true;

}

res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');

replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];

replwindow['replwindow'] = replwindow['last'] - replwindow['first'];

return replwindow;

}

Oplog: replication window

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● CPU, IO or lock related● Outcome:

○ Secondary not used by Mongo client drivers○ Puts larger strain on other secondaries○ Less likely to be elected during a failover

■ If it will be elected it could be disastrous○ Lagging behind too far could cause a full sync

Replication lag

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

my_mongodb_0:PRIMARY> db.runCommand( { replSetGetStatus: 1 } ) {

"members" : [

{

"_id" : 0,

"name" : "10.10.32.11:27017",

"stateStr" : "PRIMARY",

"optime" : {

"ts" : Timestamp(1466247801, 5),

"t" : NumberLong(1)

},

},

{

"_id" : 1,

"name" : "10.10.32.12:27017",

"stateStr" : "SECONDARY",

"optime" : {

"ts" : Timestamp(1466247801, 5),

"t" : NumberLong(1)

},

},

],

"ok" : 1

}

Replication lag

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Like any other databases: availability● Client drivers may support connection pooling

○ Multiple non-blocking queries can use the same connection○ Spawns new connections when low on threshold

● Increase of connections○ Locking issues○ Application request bursts

Connections

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLI

mongo_replica_0:PRIMARY> db.serverStatus().connections

{ "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) }

From any mongo client

mongo_replica_0:PRIMARY> db.runCommand( { serverStatus: 1 } ).connections

{ "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) }

Connections

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Atomicity on document level○ Wiredtiger and MongoRocks

● No “real” transactions● Write data with the $isolated operator

○ Similar to READ UNCOMMITTED in MySQL (dirty reads in ANSI SQL)○ No rollback○ Does not work on shards

Transactions

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Transactions

From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().opcounters

{

"insert" : 1355272,

"query" : 20712,

"update" : 8995,

"delete" : 0,

"getmore" : 400791,

"command" : 2405749

}

From any mongo clientmongo_replica_0:PRIMARY> db.runCommand({serverStatus: 1}).opcounters

{

"insert" : 1355272,

"query" : 20712,

"update" : 8995,

"delete" : 0,

"getmore" : 400791,

"command" : 2405749

}

Transactions

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Three levels of (generic) locking○ Global○ Database○ Collection

● Global lock hardly ever happens (full lock on MongoDB)● Database locks occur when dropping a collection● Collection locks occur mostly in MMAP

Locks

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().locks

{

"Global" : {

"acquireCount" : {

"r" : NumberLong(6050583),

"w" : NumberLong(2416551),

"R" : NumberLong(1),

"W" : NumberLong(7)

},

"acquireWaitCount" : {

"r" : NumberLong(1),

"w" : NumberLong(1),

"W" : NumberLong(1)

},

…}

Locks (generic)

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Optimistic concurrency control○ If two write operations conflict, the transaction will be paused and retried

● Document level locking● Tickets (threads)

○ Read○ Write

Locks (WiredTiger)

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.concurrentTransactions

{

"write" : {

"out" : 0,

"available" : 128,

"totalTickets" : 128

},

"read" : {

"out" : 0,

"available" : 128,

"totalTickets" : 128

}

}

Locks (WiredTiger)

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● MongoDB uses three tiers of cache○ Filesystem○ Active memory○ Storage engine (WiredTiger / MongoRocks)

● Page faults○ Cache miss

● Evictions

Cache

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().extra_info.page_faults

37912924

mongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.cache

{

"bytes currently in the cache" : 887889617,

"modified pages evicted" : 561514,

"tracked dirty pages in the cache" : 626,

"unmodified pages evicted" : 15823118

}

Page faults and cache usage

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Shards make write scaling transparently● Sharding can be solved with two methods:

○ Hash key distribution (limited)○ Shard lookup table

● MongoDB uses a combination of hash key distribution and shard lookup table○ Hash key (or range key) distribution gets divided into chunks (ranges)○ The chunk metadata gets stored in the config server

● The config server is the most important data in a MongoDB sharded cluster!● The shard router is the the second most important component● Shards can get out of balance

○ Non-sharded collections○ Heavy / large writes on a single chunk○ Auto balancing by the primary of the Config server (3.4) or mongos (< 3.2)

Shard metrics

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLI:mongos> sh.status()

--- Sharding Status ---

…databases:

{ "_id" : "shardtest", "primary" : "sh1", "partitioned" : true }

shardtest.collection

shard key: { "_id" : 1 }

unique: false

balancing: true

chunks:

sh1 1

sh2 2

sh3 1

From any mongo client:mongos> use config

switched to db config

mongos> db.config.runCommand({aggregate: "chunks", pipeline: [{$group: {"_id": {"ns": "$ns", "shard": "$shard"}, "total_chunks": {$sum: 1}}}]})

{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_1" }, "total_chunks" : 330 }

{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_0" }, "total_chunks" : 328 }

{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_2" }, "total_chunks" : 335 }

Shard chunks and balancing

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the ClusterControl non-sharded collection advisor:use config;

var shard_collections = db.collections.find();

var sharded_names = {};

while (shard_collections.hasNext()) {

shard = shard_collections.next();

sharded_names[shard._id] = 1;

}

var admin_db = db.getSiblingDB("admin");

dbs = admin_db.runCommand({ "listDatabases": 1 }).databases;

dbs.forEach(function(database) {

if (database.name != "config") {

db = db.getSiblingDB(database.name);

cols = db.getCollectionNames();

cols.forEach(function(col) {

if( col != "system.indexes" ) {

if( shard_names[database.name + "." + col] != 1) {

print (database.name + "." + col);

}

}

});

}

});

Non-sharded collections

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

From the MongoDB CLImongos> db.runCommand( { "connPoolStats" : 1 } )

{

"numClientConnections" : 10,

"numAScopedConnections" : 0,

"totalInUse" : 4,

"totalAvailable" : 8,

"totalCreated" : 23,

"hosts" : {

"10.10.34.11:27019" : {

"inUse" : 1,

"available" : 1,

"created" : 1

},

"10.10.34.12:27018" : {

"inUse" : 3,

"available" : 1,

"created" : 2

}

},

...

"ok" : 1

}

Connection pool

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Available MongoDB monitoring tools

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Open Source○ Nagios○ Zabbix

● Subscription based○ MongoDB Cloud Manager○ VividCortex○ ClusterControl

Alerting solutions

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Nagios-MongoDB○ https://github.com/mzupan/nagios-plugin-mongodb/○ Performs some very important checks

■ Replication lag■ Lock time percentage■ Index miss ratio

Nagios

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● MongoDB Zabbix monitoring plugin○ https://github.com/nightw/mikoomi-zabbix-mongodb-monitoring○ All the necessary metrics and more

■ Entries in oplog○ Pre-canned triggers

Zabbix

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Trending tools○ Statsd/Grafana○ Cacti○ Zabbix

● Subscription based○ MongoDB Cloud Manager○ VividCortex○ ClusterControl

Trending solutions

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Percona MongoDB Monitoring Templates○ https://www.percona.com/doc/percona-monitoring-plugins/1.1/cacti/mongodb-templates.

html

Cacti

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● PMM○ https://www.percona.com/doc/percona-monitoring-and-management/○ Open Source Monitoring & Management framework○ Can deploy, manage and monitor MySQL & MongoDB○ Uses Prometheus and Grafana

Orchestration systems: Percona Monitoring & Management

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● PMM○ https://www.percona.com/doc/percona-monitoring-and-management/○ Open Source Monitoring & Management framework○ Can deploy, manage and monitor MySQL & MongoDB○ Uses Prometheus and Grafana

Percona Monitoring & Management sessions:● MySQL Monitoring with Percona Monitoring and Management, Tue 11:30 - 12:20 in Ballroom E● Hipster MySQL Monitoring: Serving a deconstructed PMM, Tue 11:30 - 12:20 in Ballroom H● Monitoring production environment with Percona Monitoring and Management (PMM), Thu 3:00 - 3:50 in room 209

Orchestration systems: Percona Monitoring & Management

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

How to monitor MongoDB using ClusterControl

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● ClusterControl○ http://www.severalnines.com○ Deploy Mongo shards & replicasets○ Monitor and trend○ Manage configuration and backups○ Scale○ Community edition

Orechestration systems: ClusterControl

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Easily deploy and import MongoDB replicaSets and Shards

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Monitor and trend

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Cluster management

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Scale replicaSets and Shards

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Convert replicaSet into a Sharded cluster

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

Q & A

Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB

● Blog series: Become a MongoDB DBA○ http://severalnines.com/blog-categories/mongodb

● Webinar series: Become a MongoDB DBA○ http://severalnines.com/upcoming-webinars

● Visit our website for more resources!○ http://www.severalnines.com

● Other sessions by Severalnines at Percona Live 2017MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - a close up look, Wed 11:10am - 12:00pm in Ballroom DMySQL (NDB) Cluster Best Practices (Die Hard VIII), Wed 3:30pm - 4:20pm in Room 210

Additional resources

Copyright 2017 Severalnines AB

Thank You!

top related