develop an app with mongodb - percona...consider any .sort() query options and match sort direction!...
Post on 26-Apr-2020
7 Views
Preview:
TRANSCRIPT
Speaker Name
Develop an App with MongoDB
Tim VaillancourtSoftware Engineer, Percona
{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [
“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”
]}
`whoami`
Agenda
● Security● Schema, Performance, etc● Aggregation Framework● Data Integrity● Monitoring● Troubleshooting● Scaling● Elastic Deployment● Questions?
Security
Authorization
● Use Auth with 1-user-per App○ Authorization is default in modern MongoDB○ Most apps need the “readWrite” built-in role only
● Built-in Roles○ Database User: Read or Write data from collections
■ “All Databases” or Single-database○ Database Admin: Non-RW commands (create/drop/list/etc)○ Backup and Restore: ○ Cluster Admin: Add/Drop/List shards○ Superuser/Root: All capabilities○ User-defined roles are also possible
Encryption
● Make sure operations teams are aware of sensitive data in the app!● MongoDB SSL / TLS Connections
○ Supported since MongoDB 2.6x○ Minimum of 128-bit key length for security○ Relaxed and strict (requireSSL) modes○ System (default) or Custom Certificate Authorities are accepted
● Encryption-at-Rest○ Possible with:
■ MongoDB Enterprise ($$$) binaries■ Block device encryption (See Percona Blog)
Source IP Restrictions
● “authenticationRestrictions” added to db.createUser() in MongoDB 3.6● Allows access restriction by client source IP(s) and/or IP range(s)● Example:
db.createUser({user: "admin",pwd: "insertSecurePasswordHere",roles: [
{ db: "admin", role: "root" }],authenticationRestrictions: [
{ clientSource: [ "127.0.0.1", "10.10.19.0/24" ] }]
})
Schema, Performance, etc
Data Types
● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}
■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient
○ Example JavaScript to convert a field in an entire collectiondb.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.items.update( { _id: x._id }, { $set: { itemId: itemId } } )});
Data Types
● Strings○ Do not store dates as strings!
■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space as a real date!○ Do not store booleans as strings!
■ “true” -> true = 47% less space wasted● DBRefs
○ DBRefs provide pointers to another document○ DBRefs can be cross-collection
● NumberLong (MongoDB 3.4+)○ Higher precision for floating-point numbers
Indexes
● MongoDB supports BTree, text and geo indexes○ Btree is default behaviour
● By default, collection lock until indexing completes● { background:true } Indexing
○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact○ Our suggestion: rollout indexes one node at a time
■ Disable replication and change TCP port, restart.■ Apply index.■ Enable replication, restore TCP port.
Indexes
● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before
Production!● Too many indexes hurts
○ Write performance for an entire collection○ Optimiser efficiency○ Disk and RAM is wasted
● Indexes have a forward or backward direction○ Try to cover .sort() with index and match direction!
● Indexes can be “hinted” and forced if necessary
Indexes
● The size of the indexed fields impacts the size of the index○ A point so important, it has its own slide!
Indexes
● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction
■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right!
■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates of the first index:
● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1 }● {username: 1, status: 1 }● {username: 1 }
● Use db.collection.getIndexes() to view current Indexes
Query Efficiency
● Query Efficiency Ratios○ Index: keysExamined / nreturned○ Document: docsExamined / nreturned
● End goal: Examine only as many Index Keys/Docs as you return!○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0)!○ Example: a query scanning 10 documents to return 1 has efficiency 0.1○ Scanning zero docs is possible if using a covered index!
Query Efficiency
● Sorting is relatively CPU intensive due to iteration● Sharding
○ Sorting occurs on the Mongos process when no index on field○ A mongos often has fewer resources
● Match the direction of the sort direction, ie: 1 or -1
Index: { id: 1, cost: -1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })
Index: { id: 1, cost: 1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })
Bulk Writes
● Bulk Write operations allow many writes in a single operation○ Available since 3.2 as the shell operation db.collection.bulkWrite()○ Operates on a single collection○ Can improve batch insert performance
■ Helpful for ETL jobs, import/export jobs, etc○ Ordered Mode
■ Documents are written in order■ An error stops the Bulk operation
○ Unordered Mode■ Documents are written unordered■ An error DOES NOT STOP the Bulk Operation!
Antipatterns / Features to Avoid
● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified○ Example:
db.items.find({ id: 1234 }, { cost: 1, available: 1 })● Using $where operators
○ This executes JavaScript with a global lock
Antipatterns / Features to Avoid
● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with
■ Data locality■ Background Summaries / Views
● .mapReduce()○ Generally more complex code to read/maintain○ Performs slower than the Aggregation Framework○ Performs extraneous locking vs Aggregation Framework
● Unordered Bulk Writes○ Error handling can be unpredictable
Aggregation Framework
Aggregation Pipeline: .aggregate()
● Run as a pipeline of “stages” on a MongoDB collection○ Each stage passes it’s result to the next○ Aggregates the entire collection by default
■ Add a $match stage to reduce the aggregation data● Runs inside the MongoDB Server code
○ Much more efficient than .mapReduce() operations● Example stages:
○ $match - only aggregate documents that match this filter (same as .find())■ Must be 1st stage to use indexes!
○ $group - group documents by certain conditions■ Similar to “SELECT …. GROUP BY”
Aggregation Pipeline: .aggregate()
● Example stages:○ $count - count the # of documents○ $project - only output specific pieces of the data○ $bucket and $bucketAuto - Group documents based on specified expression
and bucket boundaries■ Useful for Faceted Search
○ $geoNear - Returns documents based on geo-proximity○ $graphLookup - Performs a recursive search on a collection○ $sample - Returns a random sample of documents of a specified size○ $unwind - Unwinds arrays into many separate documents○ $facet - Runs many aggregation pipelines within a single stage
Aggregation Pipeline: .aggregate()
● Just a few examples of operators that can be used each stage:○ $and / $or /$not○ $add / $subtract / $multiply○ $gt / $gte / $lt / $lte / $ne○ $min / $max / $avg / $stdDevPop○ $log / $log10○ $sqrt○ $floor / $ceil○ $in (inefficient)○ $dayOfWeek / $dayOfMonth / $dayOfYear○ $concat / $split /…
Aggregation Pipeline: .aggregate()
● More on the Aggregation Pipeline:○ https://docs.mongodb.com/manual/reference/operator/
aggregation-pipeline/○ https://docs.mongodb.com/manual/reference/operator/
aggregation/○ https://www.amazon.com/MongoDB-Aggregation-Frame
work-Principles-Examples-ebook/dp/B00DGKGWE4
Data Integrity
Storage and Journaling
● The Journal provides durability in the event of failure of the server
● Changes are written ahead to the journal for each write operation
● On crash recovery, the server○ Finds the last point of consistency to disk○ Searches the journal file(s) for the record matching the
checkpoint○ Applies all changes in the journal since the last point of
consistency
Write Concern
● MongoDB Replication is Asynchronous○ Write Concerns can simulate synchronous operations
● Write Concerns○ Per-session (or even per-query) tunable○ Allow control of data integrity of a write to a Replica Set○ Write Concern Modes
■ “w: <num>” - Writes much acknowledge to defined number of nodes■ “majority” - Writes much acknowledge on a majority of nodes■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags
○ Journal flag■ “j: <bool>” - Sets requirement for change to write to journal
Write Concern
● Write Concerns○ Durability
■ By default write concerns are NOT durable■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set
configuration!● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable
Read Concern
● Like write concerns, the consistency of reads can be tuned per session or operation
● Levels○ “local” - Default, return the current node’s
most-recent version of the data○ “majority” - Most recent version of the data that has
been ack’d on a majority of nodes. Not supported on MMAPv1.
○ “linearizable” (3.4+) - Reads return data that reflects a “majority” read of all changes prior to the read
Replication
● Size of Oplog○ Monitor this closely!○ The length of time from start to end of the oplog affects the impact of adding new nodes○ If a node is brought online with a backup within the window it avoids a full sync○ If a node is brought online with a backup older than the window it will full sync!!!
● Lag○ Due to async lag is possible○ A use of Read Concerns and/or Write Concerns can work around the logical
impact of this!
Monitoring & Troubleshooting
Usual Suspects
● Locking○ Collection-level locks○ Document-level locks○ Software mutex/semaphore
● Limits○ Max connections○ Operation rate limits○ Resource limits
● Resources○ Lack of IOPS, RAM, CPU, network, etc
db.currentOp()
● A function that dumps status info about running operations and various lock/execution details
● Only queries currently in progress are shown● Provides Query ID number, used for killing ops● Includes
○ Original Query○ Parsed Query○ Query Runtime○ Locking details
db.currentOp()
● Filter Documents○ { "$ownOps": true } == Only show
operations for the current user○ https://docs.mongodb.com/manual/refere
nce/method/db.currentOp/#examples
db.currentOp()
Operation Profiler
● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever○ The slow threshold can be defined in milliseconds
■ 50-100ms is a good starting point● Enabled Server-wide or Per-Database● This is used by Percona Monitoring and Management
Operation Profiler
● Useful Profile Metrics○ op/ns/query: type, namespace and query of a
profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve
result○ writeConflicts: # of Write Concern Exceptions
encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics
Troubleshooting: .explain()
● Shows query explain plan for query cursors, before execution
● This will include○ Winning Plan
■ Query stages● Query stages may include sharding info
in clusters■ Index chosen by optimiser
○ Rejected Plans■ Many rejected plans can be a sign of too
many indexes
Troubleshooting: .explain() and Profiler
Log File - Slow Query2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } }
keysExamined:1docsExamined:1nMatched:1nModified:1keysInserted:1keysDeleted:1numYields:0reslen:604locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }protocol:op_command106ms
Percona PMM
● Open-source monitoring from Percona!
● Based on open-source technology○ Prometheus○ Grafana○ Go Language
Percona PMM
● Simple deployment● Examples in this demo
arefrom PMM
● Correlation of OS and DBMetrics
● 800+ OS and Database metrics per ping
Percona PMM
Using PMM in your Dev Process
1. Install PMMa. Simple Docker-based deployment for
easy install2. Install PMM Clients/Agents on your
MongoDB Host(s)3. Develop your app4. Visualise your database resource
usage, queries, etc5. Repeat
Percona PMM QAN
● Allows DBAs and developers to:○ Analyze queries over periods of time○ Find performance problems○ Access database performance data securely
● Agent collected from MongoDB Profiler (required) from agent
● Query Normalization○ ie:“{ item: 123456 }” -> “{ item: ##### }”.○ Good for reduced data exposure
● CLI alternative: pt-mongodb-query-digest tool
Percona PMM QAN
Other tools
● mlogfilter○ A useful tool for processing mongod.log files
● pt-mongodb-summary○ Great for a high-level view of a MongoDB environment
● pt-mongodb-query-digest○ A command-line tool similar to PMM QAN (although much simpler)
Scaling
Read Preference
● Allows reads to be sent to specific nodes● Read Preference modes
○ primary (default)○ primaryPreferred○ secondary○ secondaryPreferred (recommended for Read Scaling!)○ nearest
● Tags○ Select nodes based on key/value pairs (one or more)○ Often used for
■ Datacenter awareness, eg: { “dc”: “eu-east” }
What is Sharding?
● Sharding is a MongoDB deployment style that allows linear scaling of data in MongoDB○ Extra system components
■ A cluster-metadata replica set and special routers are added○ Many shards can be added and removed online○ Internal balancer migrates data to create a “balanced” state○ Relies on a “shard key” (a field in documents) to partition the data
■ The choice of shard key is a critical decision■ Today shard keys cannot be changed post-deployment!
● If you expect your system to scale beyond a single replica set, see:https://www.percona.com/blog/2015/03/19/choosing-a-good-sharding-key-in-mongodb-and-mysql/
Elastic Deployment
MongoDB: An Elastic Database
● Scale Reads?○ Add more nodes to your replica-set (or shard)
■ More replica set members increases read capacity when using secondary reads■ Note: replica set members may have some replication lag
● Use a read concern if lag is a concern○ Add a caching tier (Redis/Memcached/In-application)
● Scale Writes?○ Add more shards to your (hopefully) sharded cluster
■ Increases write AND read capacity, as well as storage space○ Use MongoDB Bulk Writes○ Use a queue for batching
MongoDB DNS SRV records
● New in MongoDB 3.6● DNS SRV-record Connect Strings support
○ Allows apps to use a consistent DNS name to get a full list of MongoDB hosts○ Avoids the need for application configuration changes and reload logic○ Without DNS SRV:
mongodb://db1.example.net:27017,db2.example.net:25001,db3.example.net:25003/
○ With DNS SRV:
mongodb+srv://db1.example.net/
Speaker Name
Questions?
top related