how to scale mongodb - dataops barcelona · how to scale mongodb ... • what is ha, how mongodb...

43
June-22-2018 How to Scale MongoDB

Upload: others

Post on 08-May-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

June-22-2018

How to Scale MongoDB

Page 2: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

● Location: Skopje, Republic of Macedonia

● Education: MSc, Software Engineering

● Experience:

○ Lead Database Consultant (since 2016)

○ Database Consultant (2012 - 2016)

○ Web Developer, DBA (2007 - 2012)

● Certifications: C100DBA - MongoDB certified DBA (since 2016)

https://mk.linkedin.com/in/igorle @igorle

About me

© 2018 Pythian. Confidential

Page 3: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Overview• What is HA, how MongoDB achieves HA

• Replica set, components, deployment topologies

• Scaling (Vertical vs Horizontal)

• What is a sharded cluster in MongoDB

• Cluster components - shards, config servers, mongos

• Shard keys and chunks

• Hashed vs range based sharding

• Choosing shard key

• QA

© 2018 Pythian. Confidential

Page 4: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

HA

© 2018 Pythian. Confidential

Page 5: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

What is HA?• High availability refers to systems that are durable and likely to operate

continuously without failure for a long time• single point of failure (SPOF) is a part of a system that, if it fails, will stop

the entire system from working

© 2018 Pythian. Confidential

Application Database

SPOF? SPOF?

Page 6: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Database

Replica

What is HA?• HA on the Application vs Database layer

• Splitting Writes and Read operations

© 2018 Pythian. Confidential

Application Primary

Application

…………...

ReplicaLoad

Bal

ance

r

Load

Bal

ance

r?

Writes

Reads

Page 7: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Database

Replica

What is HA?• Application node failure

• Database node failure

• Failover (elect new Primary node)

© 2018 Pythian. Confidential

Application Primary

Application

…………...

ReplicaLoad

Bal

ance

r

Load

Bal

ance

r?

Writes

Reads

Page 8: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

• Group of mongod processes that maintain

the same data set

• Redundancy and high availability

• Increased read capacity (scaling reads)

• Automatic failover

HA with MongoDB - replica sets

© 2018 Pythian. Confidential

Page 9: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Replication concept1. Write operations go to the Primary node

2. All changes are recorded into operations log

3. Asynchronous replication to Secondary

4. Secondaries copy the Primary oplog

5. Secondary can use sync source Secondary

© 2018 Pythian. Confidential

1.

Page 10: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Replication concept1. Write operations go to the Primary node

2. All changes are recorded into operations log

3. Asynchronous replication to Secondary

4. Secondaries copy the Primary oplog

5. Secondary can use sync source Secondary

© 2018 Pythian. Confidential

2. oplog

1.

Page 11: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Replication concept1. Write operations go to the Primary node

2. All changes are recorded into operations log

3. Asynchronous replication to Secondary

4. Secondaries copy the Primary oplog

5. Secondary can use sync source Secondary

© 2018 Pythian. Confidential

2. oplog

1.

3. 3.

Page 12: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Replication concept1. Write operations go to the Primary node

2. All changes are recorded into operations log

3. Asynchronous replication to Secondary

4. Secondaries copy the Primary oplog

5. Secondary can use sync source Secondary

© 2018 Pythian. Confidential

2. oplog

1.

3. 3.

4. 4.

Page 13: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Replication concept1. Write operations go to the Primary node

2. All changes are recorded into operations log

3. Asynchronous replication to Secondary

4. Secondaries copy the Primary oplog

5. Secondary can use sync source Secondary*

*settings.chainingAllowed (true by default)

© 2018 Pythian. Confidential

2. oplog

1.

3. 3.

4. 4.

5.

Page 14: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Configuration options• 50 members per replica set (7 voting members)

• Arbiter node

• Priority 0 node

• Hidden node

• Delayed node

© 2018 Pythian. Confidential

Page 15: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

• Does not hold copy of data

• Votes in elections

Arbiter node

hidden : true

© 2018 Pythian. Confidential

Arbiter

Page 16: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

• Does not hold copy of data

• Votes in elections

Arbiter node

hidden : true

© 2018 Pythian. Confidential

Arbiter

Page 17: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Priority 0 nodePriority - floating point (i.e. decimal) number between 0 and 1000

• Cannot become primary, cannot trigger election

• Visible to application

© 2018 Pythian. Confidential

Secondarypriority : 0

Page 18: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Hidden node• Not visible to application

• Never becomes primary

• Use cases○ reporting ○ backups

hidden : true

© 2018 Pythian. Confidential

hidden: true priority:0

Secondaryhidden : true priority : 0

Page 19: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Delayed node• Must be priority 0 member

• Should be hidden member (not mandatory)

• Mainly used for backups

• Recovery in case of human error

© 2018 Pythian. Confidential

SecondaryslaveDelay : 3600priority : 0hidden : true

Page 20: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Scaling

© 2018 Pythian. Confidential

Page 21: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

• Working set - amount of memory that a process requires in a given time

interval

• Time for scaling ?

Scaling

© 2018 Pythian. Confidential

Page 22: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

● Vertical scaling

○ Adding more power (CPU, RAM, DISK) to an existing machine

○ Might require downtime while scaling up

● Horizontal scaling

○ Adding more machines into your pool of resources

○ Partitioning the data on multiple machines

○ Parallelizing the processing

○ More complex to implement and maintain

Scaling

© 2018 Pythian. Confidential

Page 23: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Scaling (vertical)

© 2017 Pythian. Confidential

1TB

1TB

Page 24: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Scaling (horizontal) - Sharding

© 2017 Pythian. Confidential

1TB

256GB 256GB 256GB 256GB

1TBShard 1 Shard 2 Shard 3 Shard 4

Page 25: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

* https://www.amazon.com/dp/0596102356

Scaling costs

© 2018 Pythian. Confidential

Page 26: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Scaling with MongoDB

© 2018 Pythian. Confidential

Page 27: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Scaling (sharding) with MongoDB

Shard

© 2018 Pythian. Confidential

Page 28: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Sharding with MongoDB

• Shard/Replica set

(subset of the sharded data)

• Config servers

(metadata and config settings)

• mongos

(query router, cluster interface)

mongos> sh.addShard("shardN")

© 2018 Pythian. Confidential

1 2 N-1 N

Page 29: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Shards

• Contains subset of sharded data

• Replica set for redundancy and HA

• Primary shard

• Non sharded collections

• --shardsvr in config file (port 27018)

© 2018 Pythian. Confidential

Page 30: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Config servers

• Stores the metadata for sharded cluster in

config database

• Authentication configuration information in

admin database

• Config servers as replica set only (>= 3.4)

• Holds balancer on Primary node (>= 3.4)

• --configsvr in config file (port 27019)

© 2018 Pythian. Confidential

Page 31: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

mongos• Caching metadata from config servers

• Routes queries to shards

• No persistent state

• Updates cache on metadata changes

• Holds balancer (mongodb <= 3.2)

• mongos version 3.4 cannot connect to

earlier mongod version

© 2018 Pythian. Confidential

Page 32: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Sharding collection• Enable sharding on database "users"

sh.enableSharding("users")

• Shard collection historysh.shardCollection("users.history", { user_id : 1 } )

• Shard key - indexed key that exists in every document in the collection○ range based

sh.shardCollection("users.history", { user_id : 1 } )

○ hashed basedsh.shardCollection( "users.history", { user_id : "hashed" } )

© 2018 Pythian. Confidential

Page 33: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Ranged sharding• Dividing data into contiguous ranges determined by the shard key values

• Documents with “close” shard key values are likely to be in the same

chunk or shard

• Query Isolation - more likely to target single shard for range queries

© 2018 Pythian. Confidential

Page 34: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Hashed sharding• Uses a hashed index of a single field as the shard key to partition data

• More even data distribution at the cost of reducing Query Isolation

• Applications do not need to compute hashes

© 2018 Pythian. Confidential

Page 35: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Shard keys and chunksChoosing Shard key

• Large Shard Key Cardinality

• Low Shard Key Frequency

• Non-Monotonically changing shard keys (ranged or hashed sharding)

Chunks

• A contiguous range of shard key values within a particular shard

• Inclusive lower and exclusive upper range based on shard key

• Default size of 64MB ([1 .. 1024] MB)

© 2018 Pythian. Confidential

Page 36: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Shard key cardinality

© 2018 Pythian. Confidential

Page 37: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Shard key frequency

© 2018 Pythian. Confidential

Page 38: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Monotonically changing shard key

© 2018 Pythian. Confidential

Page 39: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Choosing Shard keyAnalyse your query patterns before making any decision (mtools - mloginfo)

● users○ Querying data by username (unique key?). Consider {username} as

shard key● user history

○ Querying data by user_id. Consider {user_id} as shard key○ Querying data by user_id, date_created. Consider {user_id,

date_created} as shard key● views

○ Write intensive workload. created_on (timestamp). Consider hashed sharding on {timestamp}

© 2018 Pythian. Confidential

Page 40: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Shard keys limitations• Must be ascending indexed key or indexed compound keys that exists in

every document in the collection

• Cannot be multikey index, a text index or a geospatial index

• Cannot exceed 512 bytes

• Immutable key - can not be updated/changed

• Update operations that affect a single document must include the shard

key or the _id field

• No option for sharding if unique indexes on other fields exist

• No option for second unique index if the shard key is unique index

© 2018 Pythian. Confidential

Page 41: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Summary• Replica set with odd number of voting members

• Hidden or Delayed member for dedicated functions (reporting, backups …)

• Dataset fits into single server, keep unsharded deployment

• Horizontal scaling - shards running as replica sets

• Shard keys are immutable with max size of 512 bytes

• Shard keys must exists in every document in the collection

• Ranged sharding may not distribute the data evenly

• Hashed sharding distributes the data randomly • Sharding adds complexity for maintenance (backups and upgrades)

© 2018 Pythian. Confidential

Page 42: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

Questions?

© 2018 Pythian. Confidential

Page 43: How to Scale MongoDB - DataOps Barcelona · How to Scale MongoDB ... • What is HA, how MongoDB achieves HA • Replica set, components, deployment topologies • Scaling (Vertical

We’re Hiring!https://www.pythian.com/careers/

© 2018 Pythian. Confidential