silicon valley code camp 2015 - advanced mongodb - the sequel
TRANSCRIPT
Advanced MongoDB for Development, Deployment and Operation - The Sequel Daniel Coupal Technical Services Engineer, Palo Alto, CA
#MongoDB
Silicon Valley Code Camp 2015
2
• Making you successful in developing, deploying and operating an application with MongoDB
• I do expect you to know the basics of MongoDB.
• …even better if you already have an application about to be deployed or deployed
This presentation is about …
3
I hope you walk out of this presentation and you make at least one single change in your application, deployment, configuration, etc that will prevent one issue from happening.
My Goal
4
1. The Story of MongoDB
2. The Story of your Application Chapter 1: Prototype and Development
Chapter 2: Deployment
Chapter 3: Operation
3. Wrapping up
4. Q&A
Agenda
1. The Story of MongoDB
6
The Sun was shinning on the land of the Oracle …
Once upon a time
7
Then came the Web
8
We’re gonna need a bigger database
9
• Originaly, 10gen
– Founded in 2007
– Released MongoDB 1.0 in 2009
• MongoDB Inc
– Since 2013
– Acquired WiredTiger in 2014
• MongoDB
– Open source software
– Contributions on tools, drivers, …
– Most popular NoSQL database
MongoDB - Timeline
10
MongoDB - Company Overview
450+ employees 1,000+ customers
Over $300 million in funding 30+ offices around the world
11
Positions open in Palo Alto, Austin and NYC
• http://www.mongodb.com/careers/positions
Technical service engineers in Palo Alto
• MongoDB
• MongoDB Tools
• Proactive support
MongoDB - We hire!
2. The Story of your Application
13
1. Schema, schema, schema!
2. Incorporate testability in your application
3. Think about data sizing and growth
4. What happens when a failure is returned by the database?
5. Index correctly
6. Performance Tuning
Chapter 1 - Prototype and Development
14
• Relational world 1. Model your data
2. Write the application against your data
• NoSQL world 1. Define what you do want to do with the data
Ø What are your queries? 2. Model your data
Schema, schema, schema
15
• Test Driven Development
• Ask yourself, how can I test that this piece is working
• TIP: MongoDB does not need a schema and it creates databases and collections (tables) on the fly
– Incorporate username, hostname, timestamps in database names for unit tests
Incorporate testability in the application
16
• How much data will you have initially? • How will your data set grow over time? • How big is your working set? • Will you be loading huge bulk inserts, or have a
constant stream of writes? • How many reads and writes will you need to
service per second? • What is the peak load you need to provision for?
Think about data sizing and growth
17
• Good model and understanding of latencies, write concerns
• Catch exceptions
• Retries
• …
What happens when a failure is returned by the database?
18
• More than 50% of the customer issues
• Collection Scan
– Very bad if you have a large collection
– One of the main performance issue see in our customers’ application
– Can be identified in the logs with the ‘nscannedObjects’ attribute on slow queries
• Watch out for updates to the Application
Index correctly
19
1. Assess the problem and establish acceptable behavior
2. Measure the current performance
3. Find the bottleneck*
4. Remove the bottleneck
5. Re-test to confirm
6. Repeat
* - (This is often the hard part)
(Adapted from http://en.wikipedia.org/wiki/Performance_tuning )
Performance Tuning
20
1. Deployment topology
2. Have a test/staging environment – Track slow queries and collection scans
3. MongoDB production notes – http://docs.mongodb.org/manual/administration/production-notes
4. Storage considerations
5. Host considerations
Chapter 2 - Deploy
21
• Sharding or not?
• 3 data nodes per replica set or 2 data nodes + arbiter?
• Many Data Centers or availability zones
• What is important for you?
– Durability of writes
– Performance
=> can be chosen per operation
Deployment topology
22
• Best if it the capacity matches the production deployment
– Otherwise, if prod is 20 shards x 3 nodes, you can have 2 x 3 nodes, or 20 x 1 node
• Data size should be representative
– Start with simulated data
– Use backup of production data
• Disable table/collection scans or scan the logs for them
Have a test/staging environment
23
• Most important documentation of MongoDB
– http://docs.mongodb.org/v3.0/administration/production-notes/
• Security checklist
– Authentication, limit network exposure, … audit system activity
• Allocate Sufficient RAM and CPU
• MongoDB and NUMA Hardware
• Platform Specific Considerations
– Turn off atime for the storage volume containing the database files.
– Set the file descriptor limit, -n, and the user process limit (ulimit), -u, above 20,000
– MongoDB on Virtual Environments
MongoDB Production notes
24
• RAID
=> 0+1 or None
• HDD or SSD
=> SSD, if budget permit
• NAS, SAN or Direct Attached?
=> Direct Attached, good news those are the cheapest!
• File System type
MMAPv1 => ext4 WiredTiger => xfs
• Settings
=> ReadAhead
Storage considerations
25
• CPU power? – MMAPv1 => no
– WiredTiger => yes, needed for compression
• RAM? – Yes! RAM is always an order of magnitude faster than
disk
Host considerations
26
1. Monitor
2. Upgrade
3. Backup
4. Troubleshoot
Chapter 3 - Operation
27
“Shit will happen!”
• Are you prepared?
• Have backups?
• Have a good picture of your “normal state”
Disaster will strike
28
• iostat, top, vmstat, sar
• mongostat, mongotop
• CloudManager/OpsManager Monitoring – plus Munin extensions
Monitor
29
• Major versions have same binary format, same protocol, etc for each new minor version
• Major versions have upgrade and downgrade paths
• CloudManager and OpsManager Automation handles automatic upgrades
Upgrade
30
Mongodump File system CloudManager Backup
OpsManager Backup
Initial complexity Medium High Low High
System overhead High Low Low Medium
Point in time recovery of replica set
No * No * Yes Yes
Consistent snapshot of sharded system
No * No * Yes Yes
Scalable No Yes Yes Yes
Restore time Slow Fast Medium Medium
Comparing MongoDB backup approaches
* Possible, but need to write the tools and go though a lot of testing
31
• mtools – https://github.com/rueckstiess/mtools/wiki
(just Google: github mtools)
Troubleshoot
32
namespace pattern count min (ms) max (ms) mean (ms) sum (ms)
serverside.scrum_master {"datetime_used": {"$ne": 1}} 20 15753 17083 16434 328692
serverside.django_session {"_id": 1} 562 101 1512 317 178168
serverside.user {"_types": 1, "emails.email": 1} 804 101 1262 201 162311
local.slaves {"_id": 1, "host": 1, "ns": 1} 131 101 1048 310 40738
serverside.email_alerts {"_types": 1, "email": 1, "pp_user_id": 1} 13 153 11639 2465 32053
serverside.sign_up {"_id": 1} 77 103 843 269 20761
serverside.user_credits {"_id": 1} 6 204 900 369 2218
serverside.counters {"_id": 1, "_types": 1} 8 121 500 263 2111
serverside.auth_sessions {"session_key": 1} 7 111 684 277 1940
serverside.credit_card {"_id": 1} 5 145 764 368 1840
serverside.email_alerts {"_types": 1, "request_code": 1} 6 143 459 277 1663
serverside.user {"_id": 1, "_types": 1} 5 153 427 320 1601
serverside.user {"emails.email": 1} 2 218 422 320 640
serverside.user {"_id": 1} 2 139 278 208 417
serverside.auth_sessions {"session_endtime": 1, "session_userid": 1} 1 244 244 244 244
serverside.game_level {"_id": 1} 1 104 104 104 104
Troubleshoot – Slow Queries
33
• Interactive graph!
Troubleshoot – Slow Queries Plot
3. Wrapping up
35
1. Missing indexes
2. Not testing before deploying application changes
3. OS settings
4. Appropriate schema
5. Hardware
6. Not seeking help early enough
Common Mistakes
36
• MongoDB on-line presentations mongodb.com/presentations
• Free MongoDB classes university.mongodb.com
• mtools to analyze MongoDB logs github.com/rueckstiess/mtools
• CloudManager cloud.mongodb.com
Free On-Line Resources
37
• MongoDB Support
– 24x7 support
– the sun never set on MongoDB Customer Support Team
• MongoDB Consulting Days
• MongoDB World (@NYC on June 28-29, 2016)
• MongoDB Days (@SanJose on Dec 3, 2015)
Resources
38
• Use available resources
• Testing – Plan for it, plan resources for it, do it before deploying
in a Test or Staging environment
Summary
39
I hope you walk out of this presentation and you make at least one single change in your application, deployment, configuration, etc that will prevent one issue from happening.
Take away
4. Q&A
41
Positions open in Palo Alto, Austin and NYC
• http://www.mongodb.com/careers
MongoDB for Giant Ideas