mysql conference 2011 -- the secret sauce of sharding -- ryan thiessen

29

Upload: ryanthiessen

Post on 15-Jan-2015

6.052 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen
Page 2: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

The Secret Sauce of Sharding

Ryan Thiessen Database Operations April 2011

Page 3: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

1 Sharding 101

2 Bad Sharding

3 Facebook’s Universal Database

4 Re-Sharding

5 Operational Implications

Agenda

Page 4: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Sharding 101

Page 5: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Bad news: there is no single way to shard

▪  What is the secret sauce of anything?

▪  Some basic building blocks

▪  More about what NOT to do rather than a specific recipe

▪  Wide variation in implementation

Page 6: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Why not to shard your data

▪  Can’t do JOINs inside the RDBMS across shards

▪  Data denormalization has drawbacks

▪  Redundant storage

▪  Chore to keep everything in sync

▪  Ops & Maintenance is harder

▪  Schema changes, are more difficult

▪  Monitoring challenges

▪  You don’t do this because it’s cool, but because you have to

Page 7: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Why to shard your data

▪  Because you have to

▪  Doing joins outside of the RDBMS isn’t that bad

▪  Less contention on hot tables

▪  Continue using commodity hardware

▪  Single instance failure affects only a small proportion of users

Page 8: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Basic building blocks of good sharding

▪  Shard uniformity

▪  SKU, schema, queries

▪  Organize shards according to data access patterns

▪  Picking the right key to shard on

▪  Ability to grow, re-shard and shed load quickly

▪  Achieve operational efficiencies of scale

Page 9: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Bad Sharding

Page 10: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

“Sharding” by application Bad sharding

▪  Example: each application gets its own database

▪  Result:

▪  Data distribution is non-uniform, massive hot spots

▪  Every data access pattern is unique

▪  Very little efficiency of scale

Commerce Database

User Database

Logging Database

Customer Database

Sales Database

Config Database

Page 11: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Fixed hashing Bad Sharding

▪  Example: you have X instances

▪  Hashing algorithm splits data evenly across each

▪  Result:

▪  Unbalanced load, hot spots

▪  What to do about data growth?

▪  How do you re-shard and/or shed load?

Page 12: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Hyper-sharding Bad Sharding

▪  Example: hash keys randomly across all instances, without any grouping

▪  Result:

▪  every fetch has to touch many shard to fulfills request

▪  Request latency becomes the max() of all shard latencies

▪  A single shard’s availability issue affects every request

Page 13: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

How to choose a good shard key?

▪  Understand how your applications will access your data

▪  Be careful of data distribution

▪  Example: user ID

▪  Example: time grouping

▪  Example: random sharding

▪  TL;DR: use the same methodology as picking a partition key

Page 14: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Facebook’s Universal Database

Page 15: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Multiple shards per physical host Facebook UDB

▪  Multiple database shards per MySQL instance

▪  Multiple MySQL instances per host on different ports

▪  Each shard has identical schemas

▪  This enables web scale

Page 16: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Hashing Facebook UDB

▪  Group related objects together

▪  Collocate most user data on a single shard

▪  If an application has related objects, group them together

▪  When referring to objects in a remote shard, store a reference to the object in both shards

▪  Multiple logical hashing schemes can co-exist over the same set physical hosts

Page 17: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Shard management service Facebook UDB

▪  Methods:

▪  Map object IDs to logical (shard) IDs – procedural (simple hash)

▪  Map shard IDs to physical instances – manual

▪  Use Thrift to access these methods from any language

▪  Distribute shard metadata close to apps to reduce request latency

▪  Extremely read heavy

▪  Updated relatively infrequently

Page 18: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Example: fetching data from a shard Facebook UDB

▪  Example: application request to get data for object ID 12345678901

▪  Call a function: 12345678901 % 40000 => maps to shard 38901

▪  Resolve shard ID 38901 to physical instances

▪  Application is in region B and only needs read, so prefer to return a connection to shard 38901 on instance db983:3307

Instance Repl Type Region Enabled

db243:3306 master A enabled

db533:3308 replica A enabled

db874:3306 replica B disabled

db983:3307 replica B enabled

Page 19: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Adding nodes Facebook UDB

▪  New user pools

▪  List(s) of shard IDs where new objects go

▪  Reverse the hashing function, generate object ID which maps to one of the new ID pool shards

▪  Usually new instances to add more overall capacity to the tier

▪  Can be existing instances to get more utilization

App requests storage on new

node

Get list of available

shards, pick one

Generate ID which maps to

that shard

Connect to the selected shard,

save object

Page 20: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Re-Sharding

Page 21: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

The Easy Way: shedding load Re-Sharding

▪  Split off logical dbs from a single MySQL instance

Host1:3306

ShardA

ShardB

ShardC

ShardD

Host2:3306

ShardA

ShardB

ShardC

ShardD

Host1:3306

ShardA

ShardC

Host2:3306

ShardB

ShardD

Split

1.  Block writes 2.  Break replication from

Host1->Host2 3.  Drop databases 4.  Reconfigure Shard Manager

to point to new instances 5.  Re-enable writes

•  Splitting off instances running on different ports is easier

Page 22: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

The Hard Way: double-write data Re-Sharding

1.  Create new layout on all new instances

2.  On each new write, store in both places

3.  Separate process to backfill from the legacy storage

4.  Switch over reads to the new storage

5.  Monitor the old storage for reads

6.  Stop double-writes, drop old tables

▪  This is I/O intensive and painful, but very possible

Page 23: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Operational Implications

Page 24: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Everything is harder Operational Implications of sharding

▪  Monitoring is harder

▪  Schema changes are harder

▪  Upgrades are harder

▪  Backups and restores are harder

▪  Etc. Seriously.

▪  “This will probably never happen” will probably happen

▪  90% of your time can be spent on 10% of the shards (or less)

Page 25: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Top-N monitoring Operational Implications

▪  Problems with individual shards can get lost in the aggregate or mean

▪  Look at the worst “offenders”, identify outliers

▪  pmysql is an excellent tool for doing this this quickly

$ cat hosts.txt | pmysql ‘show status like “threads_running”’ | sort –k3 –n | tail –n20!!OHAI!!

Page 26: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Uniformity of shards Operational Implications

▪  Every shard should have the same schema

▪  Keeps the SKUs, configurations, etc, as consistent as possible

▪  Don’t scale shards by migrating the worst to better hardware

▪  Ops will have to keep track of this in the future

Page 27: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

Application gating Operational Implications

▪  Very easy for a bad application to consume all shard resources

▪  Limit per-shard concurrency for each application

▪  User limits are OK

▪  Admission control is better

▪  Log failures at both client and server levels

Page 28: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

The Good News: efficiencies of scale Operational Implications

▪  The problems are hard, but there are solutions

▪  Fixing the problems of the worst shards usually also have benefit the median shards

▪  Loss of a single shard is not the end of your website

▪  Easy to safely test changes on a small subset

▪  Automation and tooling mean the team can debug and fix problems with high parallelism

Page 29: MySQL Conference 2011 -- The Secret Sauce of Sharding -- Ryan Thiessen

(c) 2009 Facebook, Inc. or its licensors.  "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0