how sitecore depends on mongo db for scalability and performance, and what it can teach you
TRANSCRIPT
How Sitecore depends on MongoDB for scalability and performance, and
what it can teach youAntonios Giannopoulos
Database Administrator – ObjectRocket
Grant Killian Sitecore Architect - Rackspace
Percona Live 2017
Agenda
We are going to discuss:
Key terms
- Introduction to Sitecore
- Introduction to MongoDB
Best Practices for MongoDB with Sitecore
Scaling Sitecore
Benchmarks
Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket
Grant KillianSitecore Architect w/ Rackspace
Sitecore MVP
Sitecore ♥ MongoDB because . . .
● Unstructured document model is a better fit for
Sitecore analytics vs traditional database rows
● ∞ scalability
● Introduces key flexibility to the system
○ HTTP Session state
○ Optional repository for other Sitecore modules
○ 100% replacement for SQL Server (experimental)
■ $$$
MongoDB replica-setA group of mongod processes that maintain the same dataset
Replica sets provides:
- Redundancy
- High availability
- Scaling
MongoDB replica-setConsists of at least 3 nodes- Up to 50 nodes in 3.0 and higher
- 12 on previous versions
A replica-set node may be either:- Primary
- Secondary
- Arbiter
MongoDB replica-set
Asynchronous replication
- Delay between PRI and SECs
- SECs pull and apply operations
Automatic failover
- If a PRI fails a SEC takes its place
MongoDB replica-set
Best Practices
- Odd number of members
- Use same server specs
- Reliable network connections
- Adjust the oplog accordingly
MongoDB Sharded ClustersConsists of:Mongos- It’s a statement (query) router- Connection interface for the driver - makes sharding transparent
Config Servers: Holds cluster metadata - location of the dataShards: Contains a subset of the sharded data
MongoDB Sharded ClustersBest Practices
- Deploy shards as replica-sets
- Reliable network connections
- But most important… pick a shard key
Undo a shard key might require downtime
MongoDB Sharded ClustersWhat makes a good shard key:- High Cardinality
- Not Null values
- Immutable field(s)
- Not Monotonically increased fields
- Even read/write distribution
- Even data distribution
- Read targeting/locality
Most important choose a shard key according to your application requirements
MongoDB Storage Engines
MongoDB version 3.0 and higher supports:- MMAPv1
- WiredTiger
- RocksDB (Percona Server)
- In Memory (Percona Server)
- Fractal Tree (Percona Server)
Sitecore MongoDB Databases1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing3. Tracking_history - history worker queue for full rebuilds4. Tracking_live - task queue for real-time processing5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts
(engagement state for livetime of interactions…)
Scaling Sitecore – Separate Workloads
Move each Sitecore database to a separate instance
Sitecore uses different connection string per DatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_" />connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_" />
Instances can be optimized according to their workload
Scaling Sitecore – PolyglotUse a different storage engine per database:
- Different instances- Sharded clusters, different storage engines per shard
Percona In-memory storage engine is a good fit for _sessions- Based on the in-memory storage engine used in MongoDB Enterprise Edition- _sessions data are not persistent
Scaling Sitecore - ShardingWhat to shard:- Large collections for capacity
- Busy collections for load distribution
How to pick a shard key:- Collect a representative statement sample and identify statement patterns- Pick a shard key that scales the workload/statements- Meet sharding constraints
Scaling Sitecore - Sharding
From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace”
Sharding interaction and contact for capacity.
Scaling Sitecore - ShardingCollection InteractionReceives: Inserts, Queries and Updates
Read/Write Ratio: 60-40
Updates are using the _id
Queries are using:
"_id, ContactId” : 80%
"ContactId,_t”: 5%
"ContactId,ContactVisitIndex”: 15%
Scaling Sitecore - ShardingCollection InteractionRecommended shard key is _id:1 or _id:hashed
- Scale vast majority of statements
- But… few scatter-gather queries (around 20%)
{ContactId:1} is also decent, But:
- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule
- _id is generated by the application not the driver
- Potential for Jumbo chunks
Scaling Sitecore - Sharding
Collection InteractionChoose your shard key according to your engine
- MMAP _id:1 or _id:hashed
- WiredTiger _id:1 or _id:hashed or ContactId:1
Sitecore may optimize sharding by including ContactId on the updates
Scaling Sitecore - ShardingCollection ContactsReceives: Inserts, Queries and UpdatesRead/Write Ratio: 80-20
Updates are using the _id
Queries are using the _id (with additional fields)
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - ShardingCollection Devices
Recommended shard key is _id:1 or _id:hashed
Collection ClassificationsMap
Recommended shard key is _id:1 or _id:hashed
Collection KeyBehaviorCache
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - ShardingCollection GeoIps
Recommended shard key is _id:1 or _id:hashed
Collection OperationStatuses
Recommended shard key is _id:1 or _id:hashed
Collection ReferringSites
Recommended shard key is _id:1 or _id:hashed
Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}
Client generated _id are monotonically increased thus “hashed” added for randomness
Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype
Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")
Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}
You may use the uuidhelpers.js utility to convert _id to UUID
Download from: https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js
>doc = db.test.findOne()
{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }
>doc._id.toCSUUID()
CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")
Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection
Using numInitialChunks allows to pre-split and distribute empty chunks.- Avoid chunk splits- Avoid chunk moves
db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.
Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection
Define numInitialChunks Size= Collection size (in MB)/32Count= Number of documents/125000Limit= Number of shards*8192
numInitialChunks = Min(Max(Size, Count), Limit)
Scaling Sitecore - ShardingMove Primary
Move each sitecore database to a different shard:
(analytics, tracking_live …)
db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )
Requires downtime for live databases
Scaling Sitecore – Secondary ReadsYou can configure Secondary Reads from the driver (secondary or
secondaryPreferred)
connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da
tabase_name_?readPreference=secondary/>
In 3.4 maxStalenessSeconds was introduced to control stale reads
Specifies, in seconds, how stale a secondary can be before the client stops using
it for read operations
Scaling Sitecore – Secondary ReadsUse ReplicaSet Tags to target reads:- Direct reads to specific replica set nodes- Reduces availability
conf = rs.conf();
conf.members[0].tags = {"db": "analytics"}
rs.reconfig(conf)
Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>
Order matters when setting multiple tagsOrder matters
Scaling Sitecore – Multi Region
Challenges:
- Direct reads to the closest node
- Direct writes to the closest node
- Single database entity for reporting
- Minimum complexity
Scaling Sitecore – Multi Region
Replica Set:- Target reads using nearest read concern
- Target reads using region based tags
- Writes must go to the Primary
- Requires at least one secondary per region
Scaling Sitecore – Multi RegionSharded cluster:
- Target reads using nearest read concern
- Target reads using region based tags
- Requires at least one secondary per region
- Writes must go to the Primaries
- Tags or Zones are based on shard key ranges
- Add location to shard key as prefix – change the source code
Scaling Sitecore – Multi Region
Mongo to Mongo connector:- Creates a pipeline from a MongoDB cluster to another
MongoDB cluster
- Reads and replicates oplog operations
- Easy deployment
mongo-connector -m <name:port> -t <name:port> -d <database>
Scaling Sitecore – Connector
oplog oplog
db.Insert.foo ({a:1})
db.Insert.foo ({_id:1, a:1})
{ "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : {
"_id" : 1, a:1}
BenchmarksBenchmark 1: Single/Replica set MMAP vs Single shard/Replica set
WiredTiger (3.2.8)
Results: WiredTiger is 9.5% faster
Benchmark 2: Sharded cluster MMAP vs Sharded cluster
WiredTiger (Analytics sharded on {_id:1})
Results: WiredTiger is 9.4% faster
So what?
- Evaluate your MongoDB architecture to determine if it
would benefit from scaling
- If scaling is in order, consider this talk as a
reference
- Recognize how MongoDB’s versatility makes it
relevant to a wide variety of applications
Whats next?
- Test MongoRocks (Percona Server) against Sitecore
- Test In-Memory (Percona Server) for sessions or
cache(s)
- Expand sharding recommendations on add-ons
- Evaluate other Sitecore modules for suitability with
MongoDB
- Re-invent our benchmarks
We’re Hiring! Looking to join a dynamic & innovative team?
Justine is here at Percona Live 2017,
Reach out directly to our Recruiter at [email protected]
Questions?Thank you!!!
@iamantonios
🍍
@sitecoreagent