develop an app with mongodb - percona...consider any .sort() query options and match sort direction!...

Speaker Name

Develop an App with MongoDB

Tim VaillancourtSoftware Engineer, Percona

{name: “tim”,lastname: “vaillancourt”,employer: “percona”,techs: [

“mongodb”,“mysql”,“cassandra”,“redis”,“rabbitmq”,“solr”,“mesos”“kafka”,“couch*”,“python”,“golang”

`whoami`

Agenda

● Security● Schema, Performance, etc● Aggregation Framework● Data Integrity● Monitoring● Troubleshooting● Scaling● Elastic Deployment● Questions?

Security

Authorization

● Use Auth with 1-user-per App○ Authorization is default in modern MongoDB○ Most apps need the “readWrite” built-in role only

● Built-in Roles○ Database User: Read or Write data from collections

■ “All Databases” or Single-database○ Database Admin: Non-RW commands (create/drop/list/etc)○ Backup and Restore: ○ Cluster Admin: Add/Drop/List shards○ Superuser/Root: All capabilities○ User-defined roles are also possible

Encryption

● Make sure operations teams are aware of sensitive data in the app!● MongoDB SSL / TLS Connections

○ Supported since MongoDB 2.6x○ Minimum of 128-bit key length for security○ Relaxed and strict (requireSSL) modes○ System (default) or Custom Certificate Authorities are accepted

● Encryption-at-Rest○ Possible with:

■ MongoDB Enterprise ($$$) binaries■ Block device encryption (See Percona Blog)

Source IP Restrictions

● “authenticationRestrictions” added to db.createUser() in MongoDB 3.6● Allows access restriction by client source IP(s) and/or IP range(s)● Example:

db.createUser({user: "admin",pwd: "insertSecurePasswordHere",roles: [

{ db: "admin", role: "root" }],authenticationRestrictions: [

{ clientSource: [ "127.0.0.1", "10.10.19.0/24" ] }]

Schema, Performance, etc

Data Types

● Strings○ Only use strings if required○ Do not store numbers as strings!○ Look for {field:“123456”} instead of {field:123456}

■ “12345678” moved to a integer uses 25% less space■ Range queries on proper integers is more efficient

○ Example JavaScript to convert a field in an entire collectiondb.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.items.update( { _id: x._id }, { $set: { itemId: itemId } } )});

Data Types

● Strings○ Do not store dates as strings!

■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space as a real date!○ Do not store booleans as strings!

■ “true” -> true = 47% less space wasted● DBRefs

○ DBRefs provide pointers to another document○ DBRefs can be cross-collection

● NumberLong (MongoDB 3.4+)○ Higher precision for floating-point numbers

Indexes

● MongoDB supports BTree, text and geo indexes○ Btree is default behaviour

● By default, collection lock until indexing completes● { background:true } Indexing

○ Runs indexing in the background avoiding pauses○ Hard to monitor and troubleshoot progress○ Unpredictable performance impact○ Our suggestion: rollout indexes one node at a time

■ Disable replication and change TCP port, restart.■ Apply index.■ Enable replication, restore TCP port.

Indexes

● Avoid drivers that auto-create indexes○ Use real performance data to make indexing decisions, find out before

Production!● Too many indexes hurts

○ Write performance for an entire collection○ Optimiser efficiency○ Disk and RAM is wasted

● Indexes have a forward or backward direction○ Try to cover .sort() with index and match direction!

● Indexes can be “hinted” and forced if necessary

Indexes

● The size of the indexed fields impacts the size of the index○ A point so important, it has its own slide!

Indexes

● Compound Indexes○ Several fields supported○ Fields can be in forward or backward direction

■ Consider any .sort() query options and match sort direction!○ Composite Keys are read Left -> Right!

■ Index can be partially-read■ Left-most fields do not need to be duplicated!■ All Indexes below are duplicates of the first index:

● {username: 1, status: 1, date: 1, count: -1}● {username: 1, status: 1, data: 1 }● {username: 1, status: 1 }● {username: 1 }

● Use db.collection.getIndexes() to view current Indexes

Query Efficiency

● Query Efficiency Ratios○ Index: keysExamined / nreturned○ Document: docsExamined / nreturned

● End goal: Examine only as many Index Keys/Docs as you return!○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0)!○ Example: a query scanning 10 documents to return 1 has efficiency 0.1○ Scanning zero docs is possible if using a covered index!

Query Efficiency

● Sorting is relatively CPU intensive due to iteration● Sharding

○ Sorting occurs on the Mongos process when no index on field○ A mongos often has fewer resources

● Match the direction of the sort direction, ie: 1 or -1

Index: { id: 1, cost: -1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })

Index: { id: 1, cost: 1 }Query: db.items.find({ id: 1234 }).sort({ cost: -1 })

Bulk Writes

● Bulk Write operations allow many writes in a single operation○ Available since 3.2 as the shell operation db.collection.bulkWrite()○ Operates on a single collection○ Can improve batch insert performance

■ Helpful for ETL jobs, import/export jobs, etc○ Ordered Mode

■ Documents are written in order■ An error stops the Bulk operation

○ Unordered Mode■ Documents are written unordered■ An error DOES NOT STOP the Bulk Operation!

Antipatterns / Features to Avoid

● No list of fields specified in .find()○ MongoDB returns entire documents unless fields are specified○ Only return the fields required for an application operation!○ Covered-index operations require only the index fields to be specified○ Example:

db.items.find({ id: 1234 }, { cost: 1, available: 1 })● Using $where operators

○ This executes JavaScript with a global lock

Antipatterns / Features to Avoid

● Many $and or $or conditions○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently○ Try to avoid this sort of model with

■ Data locality■ Background Summaries / Views

● .mapReduce()○ Generally more complex code to read/maintain○ Performs slower than the Aggregation Framework○ Performs extraneous locking vs Aggregation Framework

● Unordered Bulk Writes○ Error handling can be unpredictable

Aggregation Framework

Aggregation Pipeline: .aggregate()

● Run as a pipeline of “stages” on a MongoDB collection○ Each stage passes it’s result to the next○ Aggregates the entire collection by default

■ Add a $match stage to reduce the aggregation data● Runs inside the MongoDB Server code

○ Much more efficient than .mapReduce() operations● Example stages:

○ $match - only aggregate documents that match this filter (same as .find())■ Must be 1st stage to use indexes!

○ $group - group documents by certain conditions■ Similar to “SELECT …. GROUP BY”

● Example stages:○ $count - count the # of documents○ $project - only output specific pieces of the data○ $bucket and $bucketAuto - Group documents based on specified expression

and bucket boundaries■ Useful for Faceted Search

○ $geoNear - Returns documents based on geo-proximity○ $graphLookup - Performs a recursive search on a collection○ $sample - Returns a random sample of documents of a specified size○ $unwind - Unwinds arrays into many separate documents○ $facet - Runs many aggregation pipelines within a single stage

● Just a few examples of operators that can be used each stage:○ $and / $or /$not○ $add / $subtract / $multiply○ $gt / $gte / $lt / $lte / $ne○ $min / $max / $avg / $stdDevPop○ $log / $log10○ $sqrt○ $floor / $ceil○ $in (inefficient)○ $dayOfWeek / $dayOfMonth / $dayOfYear○ $concat / $split /…

● More on the Aggregation Pipeline:○ https://docs.mongodb.com/manual/reference/operator/

aggregation-pipeline/○ https://docs.mongodb.com/manual/reference/operator/

aggregation/○ https://www.amazon.com/MongoDB-Aggregation-Frame

work-Principles-Examples-ebook/dp/B00DGKGWE4

Data Integrity

Storage and Journaling

● The Journal provides durability in the event of failure of the server

● Changes are written ahead to the journal for each write operation

● On crash recovery, the server○ Finds the last point of consistency to disk○ Searches the journal file(s) for the record matching the

checkpoint○ Applies all changes in the journal since the last point of

consistency

Write Concern

● MongoDB Replication is Asynchronous○ Write Concerns can simulate synchronous operations

● Write Concerns○ Per-session (or even per-query) tunable○ Allow control of data integrity of a write to a Replica Set○ Write Concern Modes

■ “w: <num>” - Writes much acknowledge to defined number of nodes■ “majority” - Writes much acknowledge on a majority of nodes■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags

○ Journal flag■ “j: <bool>” - Sets requirement for change to write to journal

Write Concern

● Write Concerns○ Durability

■ By default write concerns are NOT durable■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set

configuration!● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable

Read Concern

● Like write concerns, the consistency of reads can be tuned per session or operation

● Levels○ “local” - Default, return the current node’s

most-recent version of the data○ “majority” - Most recent version of the data that has

been ack’d on a majority of nodes. Not supported on MMAPv1.

○ “linearizable” (3.4+) - Reads return data that reflects a “majority” read of all changes prior to the read

Replication

● Size of Oplog○ Monitor this closely!○ The length of time from start to end of the oplog affects the impact of adding new nodes○ If a node is brought online with a backup within the window it avoids a full sync○ If a node is brought online with a backup older than the window it will full sync!!!

● Lag○ Due to async lag is possible○ A use of Read Concerns and/or Write Concerns can work around the logical

impact of this!

Monitoring & Troubleshooting

Usual Suspects

● Locking○ Collection-level locks○ Document-level locks○ Software mutex/semaphore

● Limits○ Max connections○ Operation rate limits○ Resource limits

● Resources○ Lack of IOPS, RAM, CPU, network, etc

db.currentOp()

● A function that dumps status info about running operations and various lock/execution details

● Only queries currently in progress are shown● Provides Query ID number, used for killing ops● Includes

○ Original Query○ Parsed Query○ Query Runtime○ Locking details

db.currentOp()

● Filter Documents○ { "$ownOps": true } == Only show

operations for the current user○ https://docs.mongodb.com/manual/refere

nce/method/db.currentOp/#examples

db.currentOp()

Operation Profiler

● Writes slow database operations to a new MongoDB collection for analysis○ Capped Collection “system.profile” in each database, default 1mb○ The collection is capped, ie: profile data doesn’t last forever○ The slow threshold can be defined in milliseconds

■ 50-100ms is a good starting point● Enabled Server-wide or Per-Database● This is used by Percona Monitoring and Management

Operation Profiler

● Useful Profile Metrics○ op/ns/query: type, namespace and query of a

profile○ keysExamined: # of index keys examined○ docsExamined: # of docs examined to achieve

result○ writeConflicts: # of Write Concern Exceptions

encountered during update○ numYields: # of times operation yielded for others○ locks: detailed lock statistics

Troubleshooting: .explain()

● Shows query explain plan for query cursors, before execution

● This will include○ Winning Plan

■ Query stages● Query stages may include sharding info

in clusters■ Index chosen by optimiser

○ Rejected Plans■ Many rejected plans can be a sign of too

many indexes

Troubleshooting: .explain() and Profiler

Log File - Slow Query2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } }

keysExamined:1docsExamined:1nMatched:1nModified:1keysInserted:1keysDeleted:1numYields:0reslen:604locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } }protocol:op_command106ms

Percona PMM

● Open-source monitoring from Percona!

● Based on open-source technology○ Prometheus○ Grafana○ Go Language

Percona PMM

● Simple deployment● Examples in this demo

arefrom PMM

● Correlation of OS and DBMetrics

● 800+ OS and Database metrics per ping

Percona PMM

Using PMM in your Dev Process

1. Install PMMa. Simple Docker-based deployment for

easy install2. Install PMM Clients/Agents on your

MongoDB Host(s)3. Develop your app4. Visualise your database resource

usage, queries, etc5. Repeat

Percona PMM QAN

● Allows DBAs and developers to:○ Analyze queries over periods of time○ Find performance problems○ Access database performance data securely

● Agent collected from MongoDB Profiler (required) from agent

● Query Normalization○ ie:“{ item: 123456 }” -> “{ item: ##### }”.○ Good for reduced data exposure

● CLI alternative: pt-mongodb-query-digest tool

Percona PMM QAN

Other tools

● mlogfilter○ A useful tool for processing mongod.log files

● pt-mongodb-summary○ Great for a high-level view of a MongoDB environment

● pt-mongodb-query-digest○ A command-line tool similar to PMM QAN (although much simpler)

Scaling

Read Preference

● Allows reads to be sent to specific nodes● Read Preference modes

○ primary (default)○ primaryPreferred○ secondary○ secondaryPreferred (recommended for Read Scaling!)○ nearest

● Tags○ Select nodes based on key/value pairs (one or more)○ Often used for

■ Datacenter awareness, eg: { “dc”: “eu-east” }

What is Sharding?

● Sharding is a MongoDB deployment style that allows linear scaling of data in MongoDB○ Extra system components

■ A cluster-metadata replica set and special routers are added○ Many shards can be added and removed online○ Internal balancer migrates data to create a “balanced” state○ Relies on a “shard key” (a field in documents) to partition the data

■ The choice of shard key is a critical decision■ Today shard keys cannot be changed post-deployment!

● If you expect your system to scale beyond a single replica set, see:https://www.percona.com/blog/2015/03/19/choosing-a-good-sharding-key-in-mongodb-and-mysql/

Elastic Deployment

MongoDB: An Elastic Database

● Scale Reads?○ Add more nodes to your replica-set (or shard)

■ More replica set members increases read capacity when using secondary reads■ Note: replica set members may have some replication lag

● Use a read concern if lag is a concern○ Add a caching tier (Redis/Memcached/In-application)

● Scale Writes?○ Add more shards to your (hopefully) sharded cluster

■ Increases write AND read capacity, as well as storage space○ Use MongoDB Bulk Writes○ Use a queue for batching

MongoDB DNS SRV records

● New in MongoDB 3.6● DNS SRV-record Connect Strings support

○ Allows apps to use a consistent DNS name to get a full list of MongoDB hosts○ Avoids the need for application configuration changes and reload logic○ Without DNS SRV:

mongodb://db1.example.net:27017,db2.example.net:25001,db3.example.net:25003/

○ With DNS SRV:

mongodb+srv://db1.example.net/

Speaker Name

Questions?

develop an app with mongodb - percona...consider any .sort() query options and match sort direction!...

Documents

don't get left, read this post on cell phones now

why sort? - cs186/sp03/lecs/6sorting.pdf · 2-way sort...

algorithms: sorting. rand sort. compare-and-exchange. merge...

avesta script read right to left hvarshta hukhta humata...

turing machine read/write – move left/right bb read/write...

simple sort algorithms selection sort bubble sort insertion...

manaabumiy says: remember to read from rightto left

sorting -...

a monthly journal of computer science and information...

pennsylvania area banner users group presentation€¦ ·...

sorting algorithms n 2 sorts ◦selection sort ◦insertion...

sorting chapter 9 1. objectives selection sort, insertion...

cs3334 data structures - cityu cs › ~cheewtan ›...

alg07 2011e py.ppt - cw.fel.cvut.cz€¦ · 1 selection...

princeton)university)–)computerscience) cos226:*data...

sorting routines. objectives introduction bubble sort...

merge sort quick sort

selection-sort insertion-sort bubble-sort. selection-sort

unit-4. searching methods: linear search binary search...

alg 07 - cvut.cz · 2014-03-11 · 1 selection sort (select...