scaling mongodb: avoiding common pitfalls - percona · scaling mongodb: avoiding common pitfalls...
Post on 10-Jun-2020
18 Views
Preview:
TRANSCRIPT
Scaling MongoDB: Avoiding Common Pitfalls
Jon Tobin Senior Systems Engineer Jon.Tobin@percona.com @jontobs www.linkedin.com/in/jonathanetobin
www.percona.com
Agenda
• Document Design • Data Management • Replica3on & Failover • Sharding
Document Design
• We’re Not in SQL Anymore • Embed vs Reference
• The Gray Area
www.percona.com
Document Design
• Most common source of confusion & substandard performance
• Dynamic schemas are meant for agility (flexibility) • No subs3tute for careful design • You have to live with your decisions
• Consider design carefully • NoSQL for a reason
• Denormaliza3on is common • Flexibility means that there are oIen no right or wrong
answers • Only the correct design for your use case
• MongoDB is not always the correct solu3on • Consider mixing databases to maximize strengths
www.percona.com
QnD Doc Design Pointers
Embed • Query performance priority • Fields are fairly sta3c • Size of doc can be reasonably determined • Eventual consistency acceptable
{ “_id” : ObjectId("53d98f1…") “firstName" : “Jonathan”, “lastName" : “Tobin”, “year" : 3, “classes" : [
{ “class” : “Calc 101”,
“credits” : 3, }, { -etc-
}
{ “_id” : ObjectId("53d98f1…") “firstName" : “Jonathan”, “lastName" : “Tobin”, “year" : 3, “classes" : [
ObjectID(<of_class_1>), ObjectID(<of_class_2>), ObjectID(<of_class_3>), ]
}
Reference • Insert performance priority • Updates are common • Immediate consistency necessary • Field size can’t be determined
www.percona.com
Finding Middle Ground
• Embed fields that are oIen fetched • If they don’t grow boundlessly
• Limit growing keys to 1/per doc • Move to last key
• Reference fields that are vola3le • Or are occasionally queried
• Atomicity can be achieved @ single doc level • Take care in design • Try TokuMX (ACID & MVCC)
• Index judiciously • Re-‐evaluate oIen
• Store relevant data • Archive old data (when possible) • Or delete
www.percona.com
Doc Design – Useful Resources
• MongoDB: The Defini3ve Guide -‐ On Amazon • A li^le outdated, but s3ll useful
• MongoDB Doc Design • MongoDB Use Cases -‐ Design
Data Management
• Why & How • Details & Uses • Other Options
www.percona.com
Data Management
Why? • Ephemeral data • Logs • Sensor readings • Space management • Performance??
• wiredTiger Op3ons • TTL • Capped Collec3on • Par33oned Collec3ons (TokuMX) • “Ghe^o Par33oning”
www.percona.com
Capped Collec?ons
Good • Easy to use
• Db.createCollection(“test” , { capped : true , size : <size_in_bytes> , max : <opt_max_no_of_docs> } )
• Automa3cally drop documents • Based on inser3on order
• Easy to tail Bad • No sharding • Limited ability to update docs • No deletes • Unpredictable performance @ scale
• Especially queries
www.percona.com
TTL Internals
• Addi3onal Index • Single key only (date())
• Uses a background thread on every RS member • Runs EVERY 60 seconds • Query full collec3on by TTL • Deletes every doc “>” expireAfterSeconds older than current server
3me • Single doc deletes
• Docs are only deleted from primary • Thread s3ll runs on every secondary
www.percona.com
Time to Live (TTL) Index
Good • Easy to use
• Db.coll.createIndex( {“modDate” : 1} , {expireAfterSeconds : 600} )
• Automa3c expira3on of docs Bad • Query & single doc deletes • No guarantees Math 1000 inserts/sec * 600 (expireAfterSeconds)=
600,000 single doc deletes!!
www.percona.com
Other Op?ons
• Par33oned Collec3ons • TokuMX v1.5 and later • Splits collec3on into separate files based on key • Like par33oned tables in MySQL • Deletes are “batched” and lightning fast
• “Ghe^o Par33oning” • Separate your data into collec3ons based on 3me • Difficult to s3tch collec3ons together • Need to keep track of collec3ons in applica3on • Dropping a collec3on is very efficient (rela3vely speaking)
www.percona.com
TTL Tips
• When adding TTL to exis3ng collec3on • Be careful of first run!!!
• Monitor disk u3liza3on closely 1. Increase resource pool 2. Shard
• Keep an eye on fragmenta3on • Consider using an expireAt field instead
• Will set the clock 3me expira3on instead • Can use low workload period for dele3ons
• Check out preso by Kim Wilkins from ObjectRocket • In “Useful Resources” slide (hidden)
www.percona.com
TTL -‐ Useful Resources
• Kimberly Wilkins (ObjectRocket) Math is Hard: TTL Configura3on & Considera3ons
• MongoDB Indexing Whitepaper • Open TTL Jira Ticket • MongoDB TTL Tutorial • MongoDB Google Group Discussion-‐ Par33oning • TokuMX Par33oned Collec3ons
Durability & High Availability
• Life of a Write • Durability
• Replication’s Effects • Considerations Along Your Journey
www.percona.com
Life of a Write
1. Send by applica3on 2. Receipt by mongod 3. Applica3on in memory 4. Applica3on to journal (if enabled) 5. Write to data file 6. Applica3on to replica3on journal (oplog) 7. writeConcern determines “strength of guarantee” (more to
come) • Default “acknowledged” (in memory)
8. Acknowledge of write sent to host (depends on writeConcern)
www.percona.com
Replica?on
• Every shard consists of a replica set • Each replica set has a Primary & (n) Secondaries or Arbiters (no data) • Secondaries read the Primary’s replication log (oplog) and apply it
asynchronously
www.percona.com
Replica?on Lag
• Secondaries may lag behind the primary • Rs.status() will get the current replica set config and replica3on
informa3on • Will affect:
• Durability • Elec3ons • Write throughput (depending on writeConcern)
Why? • Primary has concurrency. Replica3on is single threaded • High resource u3liza3on on secondary
• Likely I/O bound • Network latency
www.percona.com
Concerns, Tips & Tricks
• Lag affects everything we’ve talked about • 1000 inserts/sec + 5 sec lag + elec3on
• 5000 inserts rolled back • Determine it’s source and a^empt to eliminate
• Most common cause is single threaded replica3on • Double check network latency between hosts
• Solu3on: micro-‐sharding • One server – mul3ple mongods • Parallelize replica3on
• TokuMX – Read Free Replica3on • Use Fractal Tree to speed up replay
www.percona.com
Replica?on – Useful Resources
• K Chodrow Elec3on Internals Blogs • MongoDB Replica3on Whitepaper • MongoDB Troubleshoo3ng Replica Sets
Sharding & Balancing
• Setup Tips • Shard Keys
• A Look Into the Balancer
www.percona.com
Setup Tips & Gotchas
• Make use mul3ple config servers • Three is a safe bet • Back them up regularly
• One mongos per app server • When conver3ng from single RS
• Point all app servers to mongos • Prevent app servers from connec3ng directly to RS
• Chunk size determines “efficiency” of balancing • Also effects frequency of splixng overweight chunks
• Make sure config servers stay up • One server down prevents shard reconfig ops
• Balancing • Splixng
• Use DNS CNAMEs to refer to config servers • Otherwise restart every mongod & mongos on rename
www.percona.com
Shard Keys
• For insert speed: • High-‐entropy shard key (mostly random). • Balances load across all shards. • Avoid migra3ons, too expensive in MongoDB. • “Sca^er” is good.
• For query speed: • Low-‐entropy shard key (mostly sequen3al). • Range queries should only hit 1 shard. • “Sca^er” is bad.
www.percona.com
Balancing
• Main problem: random workload • Everything relies on shard (secondary) key • Documents aren’t stored in chunk (shard key) order
• Three biggest opera3ons are high ac3vity 1. Range query on shard key for chunk 2. Dona3on to recipient shard 3. Range delete on shard key for dona3ng shard
Take away: • You have to balance
• Don’t put it off for long periods • Schedule balancing during low impact periods • Manually balance
www.percona.com
Sharding Resources
• MongoDB Balancing Process • MongoDB Balancing Thresholds • MongoDB: The Defini3ve Guide -‐ On Amazon
• A li^le outdated, but s3ll useful • TokuMX Balancing Improvements & Benchmarks • Managing the Balancer
www.percona.com
Use Discount Code “WebinarPLAM”
for an additional 40 Euros off standard registration rates!
top related