mongodb 2.8 replication internals: fitting it all together
DESCRIPTION
MongoDB replication internal architecture for 2.8 Abstract: Replication in MongoDB requires deep integration with almost every part of the codebase, and has important hooks in various systems like storage, indexing, command processing and querying. Most of the replication components have seen a major overhaul recently in order to make further improvements. In this talk we will address what those pieces are, how they interact, and interesting choices made during their design. In this talk we get into the interaction of the replication protocols, commands really, writes and write concern enforcement, consensus (elections/ leader/follower/ majority) behaviors, and down into the depths of oplog generation and application on replicas. While a large part of the talk will be a technical overview of the big pieces we will dive into many important areas in order to ensure better understanding. The audience will be able to greatly affect which areas we focus on during the session, so come with ideas and a focus.TRANSCRIPT
![Page 1: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/1.jpg)
Replication
InternalsFitting Everything Together
![Page 2: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/2.jpg)
2.8, Refactored
● Architecture as of 2.8
● Unit testable; more, and faster, cpp tests
● Many changes (heartbeats, locking, future)
● Interop with 2.6
● Larger replica sets
![Page 3: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/3.jpg)
Large Blocks
● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)
![Page 4: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/4.jpg)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
![Page 5: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/5.jpg)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
![Page 6: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/6.jpg)
Topology
● Maintains Authoritative Stateo Heartbeat, ping, member state
o Roles and transitions
● Contains Decision Logic
● Unit Testable
● Serial AccessTopology Manager
CFG
![Page 7: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/7.jpg)
Examples
● updateConfig
● prepare*Response for commands
● getSyncSource, *
● setFollowerMode (state)
● processHeartbeat
● prepareHeartbeatResponse
![Page 8: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/8.jpg)
PrepareHeartbeatResponseStatus TopologyCoordinatorImpl::prepareHeartbeatResponse(...) {
// Check error conditions, then set response fields …
response->setElectable(!_getMyUnelectableReason(...));
response->setHbMsg(_getHbmsg(...));
response->setTime(...);
response->setOpTime(lastOpApplied);
if (!_syncSource) {
response->setSyncingTo(_syncSource); }
… topology_coordinator_impl.cpp:628
![Page 9: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/9.jpg)
Failover Scenario
Heart
beats P
S
S
Health Check (rsHB)Active Primary
![Page 10: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/10.jpg)
Failover Scenario
Heart
beats P
S
S
Active PrimaryP
Failed Primary
![Page 11: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/11.jpg)
Failover Scenario
Heart
beats Failed
P
S
Health Check (rsHB)
![Page 12: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/12.jpg)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
![Page 13: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/13.jpg)
Replications Coordinator
● Interface to other subsystems
● Uses executor to scheduleo Commands
o Elections, Initiate, Reconfig
o Role/State Changes
● Unit Testableo With help, requires mocking out bridge for
subsystems
Replication
Coordinator
![Page 14: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/14.jpg)
Blocks
ApplierReplication
Coordinator
OplogC
MD
s
Write
s
Qu
ery
Executo
r
Topology ManagerCFG
![Page 15: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/15.jpg)
Examples
● process*Response for commands
● awaitReplication* (for writes or migration)
● isReplEnabled
● canAcceptWrites*
![Page 16: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/16.jpg)
Accepting writesstatic bool checkIsMasterForDatabase(const std::string& db, ...) {
if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){
errorDetail->setErrCode(ErrorCodes::NotMaster);
errorDetail->setErrMessage("Not primary while writing to " + ns);
return false;
}
return true;
}
![Page 17: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/17.jpg)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
![Page 18: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/18.jpg)
Applier
● Reads from *upstream* oplog
● Applier operations transformations
● Mostly unchanged since 2.4
● Includes UpdatePosition commands
Applier
![Page 19: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/19.jpg)
Read + Apply Decoupled
● Background oplog reader thread (net)
● Pool of oplog applier threads (by collection)
Repl Source
Applier
Pool
Buffer
DB4
DB3
DB1 DB2
Local Oplog
Network
![Page 20: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/20.jpg)
Replication Operations
oplog entry (fields):
o = update, o2 = query
{ "ns" : "test.tags",
"op" : "u", "v" : 2, "ts": ...,
"o2" : { "_id" : 1 },
"o" : { "$set" : { "tags.4" : "e" } } }
![Page 21: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/21.jpg)
Blocks
Applier
Topology Manager
Replication
Coordinator
Oplog
CFG
CM
Ds
Write
s
Qu
ery
Executo
r
![Page 22: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/22.jpg)
Executor
● Serializes access to Topology state
● Serializes global state changes wrt db writes
● Processes network requests in IO pool
● Supports event/signal notification
![Page 23: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/23.jpg)
Write Request
● Sent by user
● Interpreted by command subsystem
● Checked by replication coordinator
● Executed
● Idempotent entry recorded in oplog
● ~ Replicated
● ~ Possibly verified during user write request
![Page 24: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/24.jpg)
Write Request
ApplierReplication
Coordinator
OplogC
MD
s
Write
s
Qu
ery
Executo
r
Topology ManagerCFG
![Page 25: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/25.jpg)
● Topology Manager (state machine)
● Replication Coordinator (repl facade)
● Applier (replicate/apply oplog)
● Executor (network, heartbeats, serialization)
● Commands (re-config, init, status, etc)
● External (writes, storage, query, commands)
![Page 26: MongoDB 2.8 Replication Internals: Fitting it all together](https://reader034.vdocument.in/reader034/viewer/2022042817/559c1d481a28abc7298b459c/html5/thumbnails/26.jpg)
Thanks
Questions?