running mongodb in production, part iii...troubleshooting: db.currentop() a function that dumps...
TRANSCRIPT
Speaker Name
Running MongoDB in Production, Part IIITim VaillancourtSr Technical Operations Architect, Percona
{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [
“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”
]}
`whoami`
Agenda
● Troubleshooting● Schema● Data Integrity● Scaling (Read/Writes)
Troubleshooting“The problem with troubleshooting is trouble shoots back” ~ Unknown
Troubleshooting: Usual Suspects
● Locking○ Collection-level locks○ Document-level locks○ Software mutex/semaphore
● Limits○ Max connections○ Operation rate limits○ Resource limits
● Resources○ Lack of IOPS, RAM, CPU, network, etc
Troubleshooting: db.currentOp()
● A function that dumps status info about running operations and various lock/execution details
● Only queries currently in progress are shown● Provides Query ID number, used for killing ops● Includes
○ Original Query○ Parsed Query○ Query Runtime○ Locking details
Troubleshooting: db.currentOp()
● Filter Documents○ { "$ownOps": true } == Only show
operations for the current user○ https://docs.mongodb.com/manual/refe
rence/method/db.currentOp/#examples
Troubleshooting: db.currentOp()
Troubleshooting: db.stats()
● db.stats()● Returns
○ Document-data size (dataSize)○ Index-data size (indexSize)○ Real-storage size (storageSize)○ Average Object Size○ Number of Indexes○ Number of Objects
Troubleshooting: Log File
● Interesting details are logged to the mongod/mongos log files○ Slow queries○ Storage engine details (sometimes)○ Index operations○ Sharding
■ Chunk moves○ Elections / Replication○ Authentication
Troubleshooting: Log File - Slow Query2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } }
keysExamined:1docsExamined:1nMatched:1nModified:1keysInserted:1keysDeleted:1numYields:0reslen:604locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }protocol:op_command106ms
Troubleshooting: Operation Profiler
● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever
● Support for operationProfiling data in Percona Monitoring and Management in current future goals
Troubleshooting: Operation Profiler
● Enable operationProfiling in “slowOp” mode○ Start with a very high threshold and decrease it in steps○ Usually 50-100ms is a good threshold○ Enable in mongod.conf
operationProfiling: slowOpThresholdMs: 100 mode: slowOp
Troubleshooting: Operation Profiler
● Useful Profile Metrics○ op/ns/query: type, namespace and query of a
profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve
result○ writeConflicts: # of Write Concern Exceptions
encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics
Troubleshooting: .explain()
● Shows query explain plan for query cursors
● This will include○ Winning Plan
■ Query stages● Query stages may include sharding info
in clusters■ Index chosen by optimiser
○ Rejected Plans
Troubleshooting: .explain() and Profiler
Troubleshooting: PSMDB AuditLog
● Free, open-source PSMDB feature○ MongoDB Enterprise feature ($$$)
● Provides○ Authentication and authorization○ Cluster operations○ Read and write operations
Troubleshooting: PSMDB AuditLog
● Provides○ Schema operations○ Custom application messages (if configured)
● Writes to BSON files on disk○ Read data with ‘bsondump --pretty’○ Ensure directory NOT world-readable!
Troubleshooting: Cluster Metadata
● The “config” database on Cluster Config servers○ Use .find() queries to view Cluster Metadata
● Contains○ changelog and actionlog (3.0+): Cluster Operations○ databases: Sharding enabled databases○ collections: Sharding enabled collections○ shards: Cluster Shards○ chunks: Chunk mapping/info○ settings: Sharding settings
Troubleshooting: Cluster Metadata
● Contains○ mongos: All mongos processes (forever)○ locks: Internal Cluster locks○ lockpings
Troubleshooting: Percona PMM QAN
● Allows DBAs and developers to:○ Analyze queries over periods of time○ Find performance problems○ Access database performance data securely
● Agent collected from MongoDB Profiler (required) from agent
● Query Normalization○ ie:“{ item: 123456 }” -> “{ item: ##### }”.○ Good for reduced data exposure
● CLI alternative: pt-mongodb-query-digest tool
Troubleshooting: Percona PMM QAN
Troubleshooting: Other tools
● mlogfilter○ A useful tool for processing mongod.log files
● pt-mongodb-summary○ Great for a high-level view of a MongoDB environment
● pt-mongodb-query-digest○ A command-line tool similar to PMM QAN (although much simpler)
Schema Design & Workflow
Schema Design: Data Types
● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}
■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient
○ Example JavaScript to convert a field in an entire collection■ db.items.find().forEach(function(x) {
newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } )});
Schema Design: Data Types
● Strings○ Do not store dates as strings!
■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space as a real date!○ Do not store booleans as strings!
■ “true” -> true = 47% less space wasted● DBRefs
○ DBRefs provide pointers to another document○ DBRefs can be cross-collection
● NumberLong (3.4+)○ Higher precision for floating-point numbers
Schema Design: Indexes
● MongoDB supports BTree, text and geo indexes○ Default behaviour
● Collection lock until indexing completes● { background:true }
○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact○ Our suggestion: rollout indexes one node at a time
■ Disable replication and change TCP port, restart.■ Apply index.■ Enable replication, restore TCP port.
Schema Design: Indexes
● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before
Production!● Too many indexes hurts write performance for an entire collection● Indexes have a forward or backward direction
○ Try to cover .sort() with index and match direction!
Schema Design: Indexes
● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction
■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right!
■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates of the first index:
● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1 }● {username: 1, status: 1 }● {username: 1 }
● Use db.collection.getIndexes() to view current Indexes
Schema Design: Query Efficiency
● Query Efficiency Ratios○ Index: keysExamined / nreturned○ Document: docsExamined / nreturned
● End goal: Examine only as many Index Keys/Docs as you return!○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0)!○ Example: a query scanning 10 documents to return 1 has efficiency 0.1○ Scanning zero docs is possible if using a covered index!
Schema Workflow: Antipatterns
● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified
● Using $where operators○ This executes JavaScript with a global lock
● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with
■ Data locality■ Background Summaries / Views
Data Integrity“The problem with troubleshooting is trouble shoots back” ~ Unknown
Data Integrity: Storage and Journaling
● The Journal provides durability in the event of failure of the server
● Changes are written ahead to the journal for each write operation
● On crash recovery, the server○ Finds the last point of consistency to disk○ Searches the journal file(s) for the record matching the
checkpoint○ Applies all changes in the journal since the last point of
consistency
Data Integrity: Write Concern
● MongoDB Replication is Asynchronous● Write Concerns
○ Allow control of data integrity of a write to a Replica Set○ Write Concern Modes
■ “w: <num>” - Writes much acknowledge to defined number of nodes■ “majority” - Writes much acknowledge on a majority of nodes■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags
Data Integrity: Write Concern
● Write Concerns○ Durability
■ By default write concerns are NOT durable■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set
configuration!● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable
Data Integrity: Replica Set Rollbacks
● Consider this when using “w:1” Write Concern○ A PRIMARY writes 10 documents with w:1 Write Concern to the oplog, then
dies○ SECONDARY (2x) nodes applied 5 and 7 of the changes written○ The SECONDARY with 7 changes wins PRIMARY election○ The PRIMARY that died comes back alive○ The old-PRIMARY node becomes RECOVERING then SECONDARY○ 3 documents are “rolled back” to disk
■ A JSON file written to ‘rollback’ dir on-disk when PRIMARY crashes when ahead of SECONDARYs
■ Monitor for this file existing on disk!!
Data Integrity: Replica Set Rollbacks
● Risk○ The application and/or end-user thinks this was written!
● Majority Write Concern and correct Read Concern can avoid this!
Data Integrity: Read Concern
● New feature in MongoDB and PSMDB 3.2+● Like write concerns, the consistency of reads can
be tuned per session or operation● Levels
○ “local” - Default, return the current node’s most-recent version of the data
○ “majority” - Reads return the most-version of the data that has been ack’d on a majority of nodes. Not supported on MMAPv1.
○ “linearizable” (3.4+) - Reads return data that reflects a “majority” read of all changes prior to the read
Data Integrity: Replication
● Oplog○ Ordered by time○ Written to locally after apply-time of
■ Client API change■ Or replication change
○ A crashed node will resume replication using last position from local oplog
Data Integrity: Replication
● Size of Oplog○ Monitor this closely!○ The length of time from start to end of the oplog affects the impact of adding new nodes○ If a node is brought online with a backup within the window it avoids a full sync○ If a node is brought online with a backup older than the window it will full sync!!!
● Lag○ Due to async lag is possible○ A use of Read Concerns and/or Write Concerns can work around the logical
impact of this!
Data Integrity: Disaster Recovery
● Storing data in many physical locations provides improved recovery options and integrity
● Hidden Secondaries○ Store a Hidden Secondary in another location
● Upload completed backups to another location● Use a geo-distributed architecture
○ Sharding: Look into Tag-aware Sharding○ Replica Set: Multi-locations of members
Scaling Reads“The problem with troubleshooting is trouble shoots back” ~ Unknown
Scaling Reads: Schema and Summaries
● Correct data types○ “true” vs true○ “123456” vs 123456○ "2017-09-19T16:50:58.347Z" vs ISODate("2017-09-19T16:50:58.347Z")
● Predictive Workflow○ Read-heavy apps benefit from pre-computed results○ Consider moving expensive reads computation to insert/update/delete
■ Example: move .count() read query to a summary-document, increment/decrement summary count at write-time
● External Caching○ MongoDB is fast, memcache is even faster (although very simple)
Scaling Reads: Read Preference
● Modes○ primary (default)○ primaryPreferred○ secondary○ secondaryPreferred (recommended for Read Scaling!)○ nearest
● Tags○ Select nodes based on key/value pairs (one or more)○ Often used for
■ Datacenter awareness, eg: { “dc”: “eu-east” }■ Specific workflows, eg: Analytics, BI, Batch summaries, Backups
Scaling Reads: Secondary Members
● Linear* scale-out of reads, utilization of all nodes!● Be aware
○ rs.add() new Replicas with no data triggers initial sync from Primary!■ This can also happen if the backup is too old
○ 50 member maximum, Primary included● Adding a Replica
○ Ensure configuration replSetName and key files match○ Avoid Initial Sync of new member
■ Try to “seed” new Replicas with up-to-date backup■ Let Replica replay oplogs to get in sync
Scaling Reads: Summaries/Counters/Caching
● High-read systems love summarised data● Understand the real-world usage● Hide the work in the insert/update request!
○ Answer the question BEFORE it is asked○ Try to make the read request a covered or indexed read○ Queue the insert/update to offset write cost
● Examples:○ A “.count()” query could instead read an incremented counter ($inc on write)○ Complex read-time aggregations stored as TTL-expiring document (Poor man’s
“views”)
Scaling Reads: Summaries/Counters/Caching
● Change Streams (3.6+)○ Allows code to watch a collection for changes like a message queue○ Great for writing code for summaries, counters, etc
● Spread the workload○ Memcached and Redis are cheaper and faster
■ Runs fine on cheaper hardware■ Very light connection cost (not 1MB+/connection)!
○ Apache Solr / Elasticsearch / Lucene■ For complex searches (variable conditions)■ Non-trivial text search■ Recommendations, “Did you mean?”, etc■ Return a _id value for indexed read from MongoDB
Scaling Writes (Sharding)“The problem with troubleshooting is trouble shoots back” ~ Unknown
Sharding: Shard Key
● The shard key determines the distribution of the sharded data● Avoid shard keys that…
○ Monotonic / incrementing forever○ Text○ Very large fields (if possible)○ Causes a Hotspot
■ Example: 1 x very large or popular user
Sharding: Deployment Methods
● Small Deployment / Getting Started○ Useful for app that will scale up eventually○ Start with 1 shard and a good shard key
■ Add shards as required, online■ Optionally enable balancer at low-peak only
○ Add replica set members and/or shards as the system scales● Chunk Pre-Splitting
○ Chunks that grow larger than 64mb are split by the shard Primary■ Chunk splits impact write performance!
○ Pre-creating an even distribution of chunks spanning the shard key range improves write performance greatly!
Sharding: Deployment Methods
● Chunk Pre-Splitting○ Example: A shard key with a possible value of 1-100 pre-split on 10 shards will
result in 10 pre-split chunks per shard○ More on this topic:
https://docs.mongodb.com/manual/tutorial/create-chunks-in-sharded-cluster/
DATABASE PERFORMANCEMATTERS
Questions?