running mongodb in production, part iii...troubleshooting: db.currentop() a function that dumps...

Speaker Name

Running MongoDB in Production, Part IIITim VaillancourtSr Technical Operations Architect, Percona

{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [

“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”

]}

`whoami`

Agenda

● Troubleshooting● Schema● Data Integrity● Scaling (Read/Writes)

Troubleshooting“The problem with troubleshooting is trouble shoots back” ~ Unknown

Troubleshooting: Usual Suspects

● Locking○ Collection-level locks○ Document-level locks○ Software mutex/semaphore

● Limits○ Max connections○ Operation rate limits○ Resource limits

● Resources○ Lack of IOPS, RAM, CPU, network, etc

Troubleshooting: db.currentOp()

● A function that dumps status info about running operations and various lock/execution details

● Only queries currently in progress are shown● Provides Query ID number, used for killing ops● Includes

○ Original Query○ Parsed Query○ Query Runtime○ Locking details


● Filter Documents○ { "$ownOps": true } == Only show

operations for the current user○ https://docs.mongodb.com/manual/refe

rence/method/db.currentOp/#examples

https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples

https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples

Troubleshooting: db.stats()

● db.stats()● Returns

○ Document-data size (dataSize)○ Index-data size (indexSize)○ Real-storage size (storageSize)○ Average Object Size○ Number of Indexes○ Number of Objects

Troubleshooting: Log File

● Interesting details are logged to the mongod/mongos log files○ Slow queries○ Storage engine details (sometimes)○ Index operations○ Sharding

■ Chunk moves○ Elections / Replication○ Authentication

Troubleshooting: Log File - Slow Query2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } }

keysExamined:1docsExamined:1nMatched:1nModified:1keysInserted:1keysDeleted:1numYields:0reslen:604locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }protocol:op_command106ms

Troubleshooting: Operation Profiler

● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever

● Support for operationProfiling data in Percona Monitoring and Management in current future goals


● Enable operationProfiling in “slowOp” mode○ Start with a very high threshold and decrease it in steps○ Usually 50-100ms is a good threshold○ Enable in mongod.conf

operationProfiling: slowOpThresholdMs: 100 mode: slowOp


● Useful Profile Metrics○ op/ns/query: type, namespace and query of a

profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve

result○ writeConflicts: # of Write Concern Exceptions

encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics

Troubleshooting: .explain()

● Shows query explain plan for query cursors

● This will include○ Winning Plan

■ Query stages● Query stages may include sharding info

in clusters■ Index chosen by optimiser

○ Rejected Plans

Troubleshooting: .explain() and Profiler

Troubleshooting: PSMDB AuditLog

● Free, open-source PSMDB feature○ MongoDB Enterprise feature ($$$)

● Provides○ Authentication and authorization○ Cluster operations○ Read and write operations

Troubleshooting: PSMDB AuditLog

● Provides○ Schema operations○ Custom application messages (if configured)

● Writes to BSON files on disk○ Read data with ‘bsondump --pretty’○ Ensure directory NOT world-readable!

Troubleshooting: Cluster Metadata

● The “config” database on Cluster Config servers○ Use .find() queries to view Cluster Metadata

● Contains○ changelog and actionlog (3.0+): Cluster Operations○ databases: Sharding enabled databases○ collections: Sharding enabled collections○ shards: Cluster Shards○ chunks: Chunk mapping/info○ settings: Sharding settings

Troubleshooting: Cluster Metadata

● Contains○ mongos: All mongos processes (forever)○ locks: Internal Cluster locks○ lockpings

Troubleshooting: Percona PMM QAN

● Allows DBAs and developers to:○ Analyze queries over periods of time○ Find performance problems○ Access database performance data securely

● Agent collected from MongoDB Profiler (required) from agent

● Query Normalization○ ie:“{ item: 123456 }” -> “{ item: ##### }”.○ Good for reduced data exposure

● CLI alternative: pt-mongodb-query-digest tool

Troubleshooting: Percona PMM QAN

Troubleshooting: Other tools

● mlogfilter○ A useful tool for processing mongod.log files

● pt-mongodb-summary○ Great for a high-level view of a MongoDB environment

● pt-mongodb-query-digest○ A command-line tool similar to PMM QAN (although much simpler)

Schema Design & Workflow

Schema Design: Data Types

● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}

■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient

○ Example JavaScript to convert a field in an entire collection■ db.items.find().forEach(function(x) {

newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } )});

Schema Design: Data Types

● Strings○ Do not store dates as strings!

■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space as a real date!○ Do not store booleans as strings!

■ “true” -> true = 47% less space wasted● DBRefs

○ DBRefs provide pointers to another document○ DBRefs can be cross-collection

● NumberLong (3.4+)○ Higher precision for floating-point numbers

Schema Design: Indexes

● MongoDB supports BTree, text and geo indexes○ Default behaviour

● Collection lock until indexing completes● { background:true }

○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact○ Our suggestion: rollout indexes one node at a time

■ Disable replication and change TCP port, restart.■ Apply index.■ Enable replication, restore TCP port.


● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before

Production!● Too many indexes hurts write performance for an entire collection● Indexes have a forward or backward direction

○ Try to cover .sort() with index and match direction!


● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction

■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right!

■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates of the first index:

● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1 }● {username: 1, status: 1 }● {username: 1 }

● Use db.collection.getIndexes() to view current Indexes

Schema Design: Query Efficiency

● Query Efficiency Ratios○ Index: keysExamined / nreturned○ Document: docsExamined / nreturned

● End goal: Examine only as many Index Keys/Docs as you return!○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0)!○ Example: a query scanning 10 documents to return 1 has efficiency 0.1○ Scanning zero docs is possible if using a covered index!

Schema Workflow: Antipatterns

● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified

● Using $where operators○ This executes JavaScript with a global lock

● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with

■ Data locality■ Background Summaries / Views

Data Integrity“The problem with troubleshooting is trouble shoots back” ~ Unknown

Data Integrity: Storage and Journaling

● The Journal provides durability in the event of failure of the server

● Changes are written ahead to the journal for each write operation

● On crash recovery, the server○ Finds the last point of consistency to disk○ Searches the journal file(s) for the record matching the

checkpoint○ Applies all changes in the journal since the last point of

consistency

Data Integrity: Write Concern

● MongoDB Replication is Asynchronous● Write Concerns

○ Allow control of data integrity of a write to a Replica Set○ Write Concern Modes

■ “w: <num>” - Writes much acknowledge to defined number of nodes■ “majority” - Writes much acknowledge on a majority of nodes■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags

Data Integrity: Write Concern

● Write Concerns○ Durability

■ By default write concerns are NOT durable■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set

configuration!● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable

Data Integrity: Replica Set Rollbacks

● Consider this when using “w:1” Write Concern○ A PRIMARY writes 10 documents with w:1 Write Concern to the oplog, then

dies○ SECONDARY (2x) nodes applied 5 and 7 of the changes written○ The SECONDARY with 7 changes wins PRIMARY election○ The PRIMARY that died comes back alive○ The old-PRIMARY node becomes RECOVERING then SECONDARY○ 3 documents are “rolled back” to disk

■ A JSON file written to ‘rollback’ dir on-disk when PRIMARY crashes when ahead of SECONDARYs

■ Monitor for this file existing on disk!!

Data Integrity: Replica Set Rollbacks

● Risk○ The application and/or end-user thinks this was written!

● Majority Write Concern and correct Read Concern can avoid this!

Data Integrity: Read Concern

● New feature in MongoDB and PSMDB 3.2+● Like write concerns, the consistency of reads can

be tuned per session or operation● Levels

○ “local” - Default, return the current node’s most-recent version of the data

○ “majority” - Reads return the most-version of the data that has been ack’d on a majority of nodes. Not supported on MMAPv1.

○ “linearizable” (3.4+) - Reads return data that reflects a “majority” read of all changes prior to the read

Data Integrity: Replication

● Oplog○ Ordered by time○ Written to locally after apply-time of

■ Client API change■ Or replication change

○ A crashed node will resume replication using last position from local oplog

Data Integrity: Replication

● Size of Oplog○ Monitor this closely!○ The length of time from start to end of the oplog affects the impact of adding new nodes○ If a node is brought online with a backup within the window it avoids a full sync○ If a node is brought online with a backup older than the window it will full sync!!!

● Lag○ Due to async lag is possible○ A use of Read Concerns and/or Write Concerns can work around the logical

impact of this!

Data Integrity: Disaster Recovery

● Storing data in many physical locations provides improved recovery options and integrity

● Hidden Secondaries○ Store a Hidden Secondary in another location

● Upload completed backups to another location● Use a geo-distributed architecture

○ Sharding: Look into Tag-aware Sharding○ Replica Set: Multi-locations of members

Scaling Reads“The problem with troubleshooting is trouble shoots back” ~ Unknown

Scaling Reads: Schema and Summaries

● Correct data types○ “true” vs true○ “123456” vs 123456○ "2017-09-19T16:50:58.347Z" vs ISODate("2017-09-19T16:50:58.347Z")

● Predictive Workflow○ Read-heavy apps benefit from pre-computed results○ Consider moving expensive reads computation to insert/update/delete

■ Example: move .count() read query to a summary-document, increment/decrement summary count at write-time

● External Caching○ MongoDB is fast, memcache is even faster (although very simple)

Scaling Reads: Read Preference

● Modes○ primary (default)○ primaryPreferred○ secondary○ secondaryPreferred (recommended for Read Scaling!)○ nearest

● Tags○ Select nodes based on key/value pairs (one or more)○ Often used for

■ Datacenter awareness, eg: { “dc”: “eu-east” }■ Specific workflows, eg: Analytics, BI, Batch summaries, Backups

Scaling Reads: Secondary Members

● Linear* scale-out of reads, utilization of all nodes!● Be aware

○ rs.add() new Replicas with no data triggers initial sync from Primary!■ This can also happen if the backup is too old

○ 50 member maximum, Primary included● Adding a Replica

○ Ensure configuration replSetName and key files match○ Avoid Initial Sync of new member

■ Try to “seed” new Replicas with up-to-date backup■ Let Replica replay oplogs to get in sync

Scaling Reads: Summaries/Counters/Caching

● High-read systems love summarised data● Understand the real-world usage● Hide the work in the insert/update request!

○ Answer the question BEFORE it is asked○ Try to make the read request a covered or indexed read○ Queue the insert/update to offset write cost

● Examples:○ A “.count()” query could instead read an incremented counter ($inc on write)○ Complex read-time aggregations stored as TTL-expiring document (Poor man’s

“views”)

Scaling Reads: Summaries/Counters/Caching

● Change Streams (3.6+)○ Allows code to watch a collection for changes like a message queue○ Great for writing code for summaries, counters, etc

● Spread the workload○ Memcached and Redis are cheaper and faster

■ Runs fine on cheaper hardware■ Very light connection cost (not 1MB+/connection)!

○ Apache Solr / Elasticsearch / Lucene■ For complex searches (variable conditions)■ Non-trivial text search■ Recommendations, “Did you mean?”, etc■ Return a _id value for indexed read from MongoDB

Scaling Writes (Sharding)“The problem with troubleshooting is trouble shoots back” ~ Unknown

Sharding: Shard Key

● The shard key determines the distribution of the sharded data● Avoid shard keys that…

○ Monotonic / incrementing forever○ Text○ Very large fields (if possible)○ Causes a Hotspot

■ Example: 1 x very large or popular user

Sharding: Deployment Methods

● Small Deployment / Getting Started○ Useful for app that will scale up eventually○ Start with 1 shard and a good shard key

■ Add shards as required, online■ Optionally enable balancer at low-peak only

○ Add replica set members and/or shards as the system scales● Chunk Pre-Splitting

○ Chunks that grow larger than 64mb are split by the shard Primary■ Chunk splits impact write performance!

○ Pre-creating an even distribution of chunks spanning the shard key range improves write performance greatly!

Sharding: Deployment Methods

● Chunk Pre-Splitting○ Example: A shard key with a possible value of 1-100 pre-split on 10 shards will

result in 10 pre-split chunks per shard○ More on this topic:

https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster/

https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster/

DATABASE PERFORMANCEMATTERS

Questions?