sharding in mongodb days 2013

Post on 09-May-2015

908 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Sharding presentation used at the MongoDB Days 2013 conferences in North America: Seattle, Chicago,

TRANSCRIPT

Introduction To ShardingJ. Randall HuntHackathoner, MongoDB@jrhunt, randall@mongodb.com

#MongoDBDays Chicago

In Today's Talk

• What? Why? When?

• How?

• What's happening beind the scenes?

What Is Sharding?

This is a picture of my cat.

This is a picture of ~100 cats.

http://a1.s6img.com/cdn/0011/p/3123272_8220815_lz.jpg

This is a cat trying to find a home

webserver mongod

100 cats trying to find a home.

webserver mongod

(not to scale)

Scale Up?

Data Store Scalability

• Custom Hardware

• Custom Software

In the past you've had two options for achieving data store scalability: 1) custom hardware (oracle?) 2) custom software (google, facebook) !The reason these things were custom were that these problems were not yet common enough. The number of people on the internet 10 years ago is incredibly small compared to the number of people using web services 10 years from now.

Scale Out?

Scale Out?

The MongoDB Sharding Solution

• Automatically partition your data

• Worry about failover at the partition layer

• Application independent

• Free and open source

Why Do I Shard?

Input/Output

You input/output exceeds the capacity of a single node or replica set.

this is not easy to do!

Working Set Exceeds Physical Memory

RAM

Working Set Exceeds Physical Memory

RAMData

Working Set Exceeds Physical Memory

RAMData Indexes

Working Set Exceeds Physical Memory

RAMData Indexes Sorts

Working Set Exceeds Physical Memory

RAMData Indexes Sorts Aggregations

Working Set Exceeds Physical Memory

RAMData Indexes Sorts Aggregations

Working Set Exceeds Physical Memory

How Does Sharding Work?

MongoDB's Sharding Infrastructure

mongod

MongoDB's Sharding Infrastructureapp server

mongodmongodmongod

MongoDB's Sharding Infrastructureapp server

shard

MongoDB's Sharding Infrastructureapp server

shard

MongoDB's Sharding Infrastructureapp server

shard

MongoDB's Sharding Infrastructureapp server

mongos

shard

MongoDB's Sharding Infrastructureapp server

mongos

mongod --configsvr

shard

MongoDB's Sharding Infrastructureapp server

mongos

mongod --configsvr

Terminology

• Shards

• Chunks

• Config Servers

• mongos

A shard is a server, or a collection of servers, that holds chunks of info which are split up according to a shard key, a shard holds a subset of a collection's data A chunk of info is a group of data falling in a particular range based on a shard key that can be moved logically from server to server config serves hold information about where chunks live mongos is the router and balancer -- it communicates with the config servers and figures out how to intelligently direct your query.

What exactly is a shard?

• Shard is a node of the cluster

• Can be a single mongod or an entire replica set

Shard

Primary

Secondary

Secondary

Shard

orMongod

Now what do shards hold? Chunks, which are partitions of your data that live in certain ranges.

Partitioning

• User defines a shard key or uses hash based sharding

• Shard key defines a range of data

• The key space is like points on a line

• A range is a segment of that line

-∞ +∞Key Space

Remember interval notation?

Data Distribution

Initially a single chunk

Default Max Chunk Size: 64mb

MongoDB will automatically split and migrate chunks as they reach the max size

Node 1SecondaryConfigServer Shard 1

MongosMongos Mongos

Shard 2

Mongod

Shards and Shard Keys

Shards and Shard Keys

Chunks!

Shards and Shard Keys

Chunks!

Shard Keys!

What is a config server?

• A config server is for storing shard meta-data

• It stores chunk ranges and locations

• Run with 3 in production!

orNode 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

this is not a replica set, the three servers are purely for failover purposes. !pro-tip use CNAMEs to identify these.

What is a mongos?

• Acts as a router / balancer for queries and ops

• No local data (persists all info to the config servers)

• Can run with just one or many

App Server

Mongos Mongos

App Server App Server App Server

Mongos

or

MongoDB's Sharding Infrastructure

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Get Started With Sharding?

1. Choose a shard key (we'll talk about this later)

2. Start config servers

3. Turn on sharding

4. Profit.

Mechanics of ShardingOh hey there devops!

Start the Configuration Server

mongod --configsvr

Starts a configuration server on the default port (27019)

Node 1SecondaryConfigServer

Start the mongos router

mongos --configdb catconf.mongodb.com:27019

Node 1SecondaryConfigServer

Mongos

Start the mongod

mongod --shardsvr

Starts a mongod with the default shard port (27018) Shard is not yet connected to the rest of the cluster Could have already been a part of the cluster

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Add the Shard

On mongos:

sh.addShard('cat1.mongodb.com:27018')

For a replica set:

sh.addShard('<rsname>/<seedlist>')

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Check that everything is working!

[mongos] admin> db.runCommand({ listshards: 1 }) { "shards": [ { "_id": "shard0000", "host": "cat1.mongodb.com:27018" } ], "ok": 1 }

Node 1SecondaryConfigServer

Mongos

Mongod

Shard

Now enable sharding

• Enable Sharding on a database sh.enableSharding("<dbname>")

• Shard a collection (with a key): sh.shardCollection( "<dbname>.cat", {"name": 1})

• Use a compound shard key to prevent duplicates sh.shardCollection( "<dbname>.cats", {"name": 1, "uniqueid": 1})

Tag Aware Sharding

• Total control over the distribution of your data!

• Tag a range of shard keys: sh.addTagRange(<collection>,<min>,<max>,<tag>)

• Tag a shard: sh.addShardTag("shard0000","NYC")

The Balancer

• Ensures even distribution of chunks across the cluster

• Transparent to driver and application

• Very tuneable but defaults are often sensible

try to minimize clock skew with ntpd

Routing Requests(Oh hi there application developers!)

Cluster Request Routing

Scatter Gather Targeted

Choose your own adventure!

Targeted Query

Shard Shard Shard

Mongos

Routable request received

Shard Shard Shard

Mongos

1

Request routed to appropriate shard

Shard Shard Shard

Mongos

1

2

Shard returns results

Shard Shard Shard

Mongos

1

2

3

mongos returns results to client

Shard Shard Shard

Mongos

1

2

3

4

Non-targeted queries

Shard Shard Shard

Mongos

request received

Shard Shard Shard

Mongos

1

Farm request out to all shards

Shard Shard Shard

Mongos

1

2 22

shards return results to mongos

Shard Shard Shard

Mongos

1

2 2 2

3 33

mongos returns results to client

Shard Shard Shard

Mongos

1

2 2 2

3 33

4

Choosing A Shard Key

Things to remember!

• Shard Key is immutable

• Shard key values are immutable

• Shard key must be indexed

• It is limited to 512 bytes in size

• Try to choose a field used in queries

• Only the shard key can be guaranteed unique across shards

should not be monotonically increasing!

How to choose your key?

• Cardinality

• Write Distribution

• Query Isolation

• Reliability

• Index Locality

Cardinality – Can your data be broken down enough? Query Isolation - query targeting to a specific shard Reliability – shard outages!A good shard key can: Optimize routing Minimize (unnecessary) traffic Allow best scaling !consider pre splitting no unique indexes keys unless part of the shard key !geokeys cannot be part of a shardkey $near won't work but the $geo commands work fine

Thanks!

• What's Next?

• Resources:https://education.mongodb.com/https://www.mongodb.com/presentations

• Me:@jrhunt, randall@mongodb.com

In summary -- and this is not a sales pitch... lots of other databases out there have sharding and replication... not many of them provide the granularity of control that you need for your applications while maintaining sensible defaults.

top related