2011 mongo fr - scaling with mongodb

25
Eliot Horowitz @eliothorowitz MongoUK March 21, 2011 caling with MongoD

Upload: antoinegirbal

Post on 10-May-2015

2.297 views

Category:

Documents


1 download

DESCRIPTION

For applications that outgrow the resources of a single database server, MongoDB can convert to a sharded cluster, automatically managing failover and balancing of nodes, with few or no changes to the original application code. This talk starts by discussing when to shard and continues on to describe MongoDB's sharding architecture. We'll describe how to configure a shard cluster and provide several example topologies. We'll also give some advice on schema design for sharding and how to pick the best shard key.

TRANSCRIPT

Page 1: 2011 mongo FR - scaling with mongodb

Eliot Horowitz@eliothorowitz

MongoUKMarch 21, 2011

Scaling with MongoDB

Page 2: 2011 mongo FR - scaling with mongodb

Scaling

•Storage needs only go up

•Operations/sec only go up

•Complexity only goes up

Page 3: 2011 mongo FR - scaling with mongodb

Horizontal Scaling

•Vertical scaling is limited

•Hard to scale vertically in the cloud

•Can scale wider than higher

Page 4: 2011 mongo FR - scaling with mongodb

Read Scaling

•One master at any time

•Programmer determines if read hits master or a slave

•Pro: easy to setup, can scale reads very well

•Con: reads are inconsistent on a slave

•Writes don’t scale

Page 5: 2011 mongo FR - scaling with mongodb

One Master, Many Slaves

•Custom Master/Slave setup

•Have as many slaves as you want

•Can put them local to application servers

•Good for 90+% read heavy applications (Wikipedia)

Page 6: 2011 mongo FR - scaling with mongodb

Replica Sets

•High Availability Cluster

•One master at any time, up to 6 slaves

•A slave automatically promoted to master if failure

•Drivers support auto routing of reads to slaves if programmer allows

•Good for applications that need high write availability but mostly reads (Commenting System)

Page 7: 2011 mongo FR - scaling with mongodb

•Many masters, even more slaves

•Can scale in two dimensions

•Add Shards for write and data size scaling

•Add slaves for inconsistent read scaling and redundancy

Sharding

Page 8: 2011 mongo FR - scaling with mongodb

Sharding Basics

•Data is split up into chunks

•Shard: Replica sets that hold a portion of the data

•Config Servers: Store meta data about system

•Mongos: Routers, direct direct and merge requests

Page 9: 2011 mongo FR - scaling with mongodb

Architecture

client

mongos ...mongos

mongod

mongodddd ...

Shards

mongod

mongod

mongod

ConfigServers

mongod

mongod

mongodddd

mongod

mongod

mongodddd

mongod

client client client

Page 10: 2011 mongo FR - scaling with mongodb

Common Setup

•A common setup is 3 shards with 3 servers per shard: 3 masters, 6 slaves

•Can add sharding later to an existing replica set with no down time

•Can have sharded and non-sharded collections

Page 11: 2011 mongo FR - scaling with mongodb

Range Based

•collection is broken into chunks by range

•chunks default to 64mb or 100,000 objects

MIN MAX LOCATION

A F shard1

F M shard1

M R shard2

R Z shard3

Page 12: 2011 mongo FR - scaling with mongodb

Config Servers

•3 of them

•changes are made with 2 phase commit

•if any are down, meta data goes read only

•system is online as long as 1/3 is up

Page 13: 2011 mongo FR - scaling with mongodb

mongos

•Sharding Router

•Acts just like a mongod to clients

•Can have 1 or as many as you want

•Can run on appserver so no extra network traffic

•Cache meta data from config servers

Page 14: 2011 mongo FR - scaling with mongodb

Writes

•Inserts : require shard key, routed

•Removes: routed and/or scattered

•Updates: routed or scattered

Page 15: 2011 mongo FR - scaling with mongodb

Queries

•By shard key: routed

•sorted by shard key: routed in order

•by non shard key: scatter gather

•sorted by non shard key: distributed merge sort

Page 16: 2011 mongo FR - scaling with mongodb

Splitting

•Take a chunk and split it in 2

•Splits on the median value

•Splits only change meta data, no data change

Page 17: 2011 mongo FR - scaling with mongodb

SplittingMIN MAX LOCATION

A Z shard1

T1

MIN MAX LOCATION

A G shard1

G Z shard1

T2

MIN MAX LOCATION

A D shard1

D G shard1

G S shard1

S Z shard1

T3

Page 18: 2011 mongo FR - scaling with mongodb

Balancing

•Moves chunks from one shard to another

•Done online while system is running

•Balancing runs in the background

Page 19: 2011 mongo FR - scaling with mongodb

MigratingMIN MAX LOCATION

A D shard1

D G shard1

G S shard1

S Z shard1

T3

MIN MAX LOCATION

A D shard1

D G shard1

G S shard1

S Z shard2

T4

MIN MAX LOCATION

A D shard1

D G shard1

G S shard2

S Z shard2

T5

Page 20: 2011 mongo FR - scaling with mongodb

Choosing a Shard Key

•Shard key determines how data is partitioned

•Hard to change

•Most important performance decision

Page 21: 2011 mongo FR - scaling with mongodb

Use Case: User Profiles

{ email : “[email protected]” ,

addresses : [ { state : “NY” } ]

}

•Shard by email

•Lookup by email hits 1 node

•Index on { “addresses.state” : 1 }

Page 22: 2011 mongo FR - scaling with mongodb

Use Case: Activity Stream

{ user_id : XXX, event_id : YYY , data : ZZZ }

•Shard by user_id

•Looking up an activity stream hits 1 node

•Writing even is distributed

•Index on { “event_id” : 1 } for deletes

Page 23: 2011 mongo FR - scaling with mongodb

Use Case: Photos

{ photo_id : ???? , data : <binary> }

What’s the right key?

•auto increment

•MD5( data )

•now() + MD5(data)

•month() + MD5(data)

Page 24: 2011 mongo FR - scaling with mongodb

Use Case: Logging

{ machine : “app.foo.com” , app : “apache” ,

when : “2010-12-02:11:33:14” , data : XXX }

Possible Shard keys

•{ machine : 1 }

•{ when : 1 }

•{ machine : 1 , app : 1 }

•{ app : 1 }

Page 25: 2011 mongo FR - scaling with mongodb

Download MongoDBhttp://www.mongodb.org

and let us know what you think@eliothorowitz @mongodb

10gen is hiring!http://www.10gen.com/jobs