walking the walk: developing the mongodb backup service with mongodb
TRANSCRIPT
![Page 1: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/1.jpg)
Engineer, Cloud Team, 10gen
Steve Briskin
Walking the Walk:Developing the MongoDB Backup Service With MongoDB
![Page 2: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/2.jpg)
Agenda• Intro: The Project
• How the backup service was built– Keeping State– Storage of Oplog Documents– De-duped Snapshot Storage
• Q&A
![Page 3: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/3.jpg)
The Project
• Started in December 2011 – 1 person
• 3 Engineers + PM & Manager by June 2012
• Private Beta – September 2012
• Limited Release – April 2013
• 6 Engineers (and hiring) + PM & Manager – Now
• Agile Principles
![Page 4: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/4.jpg)
Data Flow
Reconstructed Replica Sets
Sharded Cluster
BRS Daemon
Backup
Agent
Replica Set 1
Customer
Replica Set 4
Replica Set 3
Replica Set 2
BackupIngestion
10GEN
BackupDaemon(s
)
Main DB
Block Store
RS1
RS2
RS3
RS4
2. Initial Sync3. OpLog Data
1. Configuration 4. Save
Sync/Oplog Data
5. Reconstruct Replica Set
6. Persist Snapshot
7. Retrieve Snapshot
8. SCP Data Files
![Page 5: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/5.jpg)
How We Built It (Iteratively)
![Page 6: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/6.jpg)
Keeping State – First Version• One document per replica set being
backed up {
_id : ObjectId("5194ecde036446e958b9df9b"),groupId : “Customer Group”,replicaSet : ”ReplSet Name",broken : false,workingOn : “Initial Sync”,numOplogs : NumberInt(100),head : Timestamp(1370982242, 1),lastOplog : Timestamp(1370982243, 1),lastSnapshot : Timestamp(1370981940, 1),machine : "backup1.10gen.com"
}
![Page 7: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/7.jpg)
Keeping State – Current Version
• More fields, Nested Documents. Still No Joins.{
_id : ObjectId("5194ecde036446e958b9df9b"),groupId : “Customer Group”,replicaSet : ”ReplSet Name",broken : false,workingOn : {…},head : { ts : Timestamp(1370982242, 1),
hash: 49238479326510 },
lastOplog : { ts : Timestamp(1370982243, 1), hash : 93408342387492 }
numOplogs : NumberLong(9400),oplogNamespace : “CustomerGroup.oplogs_ReplSetName”lastSnapshot : Timestamp(1370981940, 1),nextSnapshot : Timestamp1371003540, 1),schedule : {
reference : 13709812343,rules { [{…}, {…}] }
}machine : "backup1.10gen.com"
}
Simple Value -> Nested Document
Integer -> Long
Complex, Nested Document
![Page 8: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/8.jpg)
Imitating a Secondary: Capturing and storing the oplog
![Page 9: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/9.jpg)
Capture Oplog
• Use replication oplog to capture activity
• Oplog is a Capped Collection – local.oplog.rs– We can tail Capped Collections
• Strategy– Tail the Oplog– Read 10 MB of Data– Compress and Send to 10gen
![Page 10: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/10.jpg)
Store Oplog – First Version• Single Capped Collection
• Pros– Easy
• Cons– Doesn’t scale!– Customers will have an impact on each other
![Page 11: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/11.jpg)
Store Oplog – Good Version• DB per customer and Collection per
replica set
• TTL Index for cleanup
• Pros– Logical and Physical separation of customer data– Can scale quickly and easily– Configurable by end user
![Page 12: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/12.jpg)
Storing the Snapshots
![Page 13: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/13.jpg)
Storage – First Version
• Archive and Compress MongoDB data files
• Scatter archives across machines– Pros• Fast and Easy
– Cons• No Redundancy, Hard to Scale, Wastes
SpaceMachine 1
Snapshot_1.tar.gzSnapshot_4.tar.gz
Machine 2
Snapshot_2.tar.gzSnapshot_5.tar.gz
Machine 3
Snapshot_3.tar.gzSnapshot_6.tar.gz
![Page 14: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/14.jpg)
Goal 1: De-Duplicated Storage
• Observation– Data change is low and localized– Data is compressible
• Huge benefits in de-duplicatingWorst Case
0% de-dupeNo compression
Best Case100% de-dupe
10x compression
Typical Case90% de-dupe
3x compression
100GB
100GB
100GB
100GB
100GB
100GB
10GB 0GB 100GB
100GB
33GB 3GB
![Page 15: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/15.jpg)
Goal 2: Redundancy and Scalability
• Require High Availability & Redundancy– MongoDB Replication!
• Require Ability to Scale– MongoDB Sharding!
![Page 16: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/16.jpg)
Block Store
db_file.0
SHA-256 Hash = “de23425..”Data = BinData[……]
SHA-256 Hash = “3af37..”Data = BinData[……]
SHA-256 Hash = “e721ac..”Data = BinData[……]
![Page 17: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/17.jpg)
Block Store
• File reference
![Page 18: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/18.jpg)
Block Store InternalsFiles Collection
{_id :
ObjectId("5194ece0036446e958b9dfa1"),filename : ”db_file.0",size : NumberLong(786432),
blocks : [{
hash : "de2f256064….",
size : 96},
{hash :
”47a9834f23….",size : 32121
},….
}
Blocks Collection
{_id :
"de2f256064a0af797747c2b9755dcb9f3df0de4f489eac731c23ae9ca9cc31",
bytes : BinData(0,"H4sIAAAAAAAAAO3BAQEAAACAkP6v7ggKAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAauuOl9cAAAEA"),
zippedSize : 96,size : 65536
}
SHA-256 Hash
SHA-256 Hash
![Page 19: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/19.jpg)
Putting the file back together• For each file
– For each block• Retrieve block• Uncompress
![Page 20: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/20.jpg)
Block Store Garbage Collection• 1st Attempt
– Reference counting– Slow and non-parallelizable
• 2nd Attempt– Mark and Sweep– Parallelizable– Requires more space
![Page 21: Walking the Walk: Developing the MongoDB Backup Service with MongoDB](https://reader038.vdocument.in/reader038/viewer/2022103114/555159f1b4c905a8768b4b7b/html5/thumbnails/21.jpg)
Q&A