scaling dropbox - qconf sf 11/08/2016 - qconsf.com

Post on 13-Feb-2017

237 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scaling DropboxP R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

block.dropbox.com

Zone (west)

Zone (east)

Zone (central)

Fear of the unknown

M E M O R Y L E A K

S Y N C H O R N I Z AT I O N E V E N T

Success story

T O D AY ’ S TA L K

• 2012

• SCALING CHALLENGES

• 2016

• Q&A

P R E S L AV L E

• At Dropbox since 2013

• Projects: Magic Pocket, Infrastructure Performance, Traffic team

F I L E , S Y N C & S H A R E

5 0 0 M I L L I O N U S E R S

2 0 1 2

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

B L O C K D ATA I N S 3

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters AWS

M E TA D ATA I N M Y S Q L

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacentersDropbox’s datacenters

1 . F E T C H M E TA D ATA

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

metaserver

DB

LB

clients

Memcached

2 . D O W N L O A D B L O C K S

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

blockserver

S3

LBLB

clients

3 . WA I T F O R N OT I F I C AT I O N S

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

notification server

clients

metaserver

P Y T H O N E V E R Y W H E R E

KMOD ON SCALE

metaservermetaserver

metaserverblockserver

blockserverblockserver

S3DBDB

DBMemcached

MemcachedMemcached

nginxnginx

LB

notification server

clients

nginxnginx

LB

async processingasync

processingasync processing

AWSDropbox’s datacenters

Dropbox’s datacenters

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

C L U S T E R I SOLAT ION

Scaling Databases Scaling as Organization

Scaling Software Managing Complexity

S C A L I N G D ATA B A S E S

mysqlmaster

mysqlreplica

mysqlreplica

metaserverMemcachedMemcachedMemcached

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

H O R I Z O N TA L S C A L I N G

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

C O N N E C T I O N S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

S Q L P R O X Y

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaservermetaserver metaserver

SQL Proxy SQL Proxy SQL Proxy

Scaling as Organization

Scaling Software Managing Complexity

Scaling Databases

G L O B A L D ATA B A S E

AVA I L A B I L I T Y I S S U E S

P L AY B O O K

1. Check for ongoing deployments or newly enabled features

P L AY B O O K

1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs

1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs 3. DBA oncall, please help!

P L AY B O O K

Dropbox grew from 100 to 500 employees

• Slow queries would adversely impact performance across the board

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity

• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity • Impacted developer productivity

S C A L A B L E M E TA D ATA S T O R E D E S I G N E D F O R M U L T I -T E N A N C Y

KMOD ON SCALE

2013 — Present

S H A R D I N G A N D C A C H I N G B E H I N D T H E S C E N E S

KMOD ON SCALE

E N T I T I E S A N D A S S O C I AT I O N S

KMOD ON SCALE

F I R S T G O S E R V I C E

Scaling Software

Scaling as Organization

Managing Complexity

Scaling Databases

P E R F E C T S T O R M

S H A R D I N G

P H O T O A L B U M S

T E A M A D M I N C O N S O L E

R E Q U E S T F A N O U T

request

Colocation ID Counter

8 bytes 8 bytes

G L O B A L I D

• Colocation ID: Identifies a shard • Counter: Unique ID within the shard

Lack of colocation also hurts performance

N E W S E R V I C E : F I L E J O U R N A L

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

S H A R D F A I L U R E

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

S H A R D I N G ( PA R T I I )

L O N G T I M E O U T S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

R U N O U T O F W O R K E R S

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

File JournalFile Journal File Journal

C A S C A D I N G F A I L U R E

shard1master

shard1replica

shard1replica

shard0master

shard0replica

shard0replica

shardNmaster

shardNreplica

shrardNreplica

……metaserver metaserver metaserver metaserver

File Journal File Journal File Journal…

metaserver metaserver

shard1master

File JournalFile Journal File Journal

metaserver metaserver metaserver metaservermetaserver metaserver

Limit resources dedicated to processing a single shard

S H A R D I SOLAT ION

Managing Complexity

Scaling as Organization

Scaling Software

Scaling Databases

500PB+ user block data

3+ geographic regions

500+ million users

M A G I C P O C K E T B L O C K S T O R A G E S Y S T E M

Zone (west)

Zone (east)

Zone (central)

put

put putget get

get

complicated!

☺simple

complicated!

complicated!

☹complicated!

complicated!

☹complicated!

P Y T H O N , G O & R U S T

https://blogs.dropbox.com/tech/

2 0 1 6

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

File JournalFile Journal

File JournalSearch

SearchSearch

AuthAuthAuthservice

Block RoutingBlock

RoutingBlock Routing

AuthAuth

Edgestore

AuthAuthPresence

&Notications

File JournalFile Journal

Cape…

blockserverblockserver

blockserver

Magic PocketMagic

PocketMagic Pocket

Blockservice

RivieraRivieraThumbnail

service

H O W T O P R E V E N T C A S C A D I N G F A I L U R E ?

meta-clientmeta-client

meta-clientmeta-client

meta-clientmeta-web

meta-apimeta-api

meta-apimeta-mobile

meta-mobilemeta-mobile

File JournalFile Journal

File JournalSearch

SearchSearch

AuthAuthAuthservice

Block RoutingBlock

RoutingBlock Routing

AuthAuth

Edgestore

AuthAuthPresence

&Notications

File JournalFile Journal

Cape…

blockserverblockserver

blockserver

Magic PocketMagic

PocketMagic Pocket

Blockservice

RivieraRivieraThumbnail

service

Search

B A N D A I D : P E R R O U T E I SOLAT ION

Q U E U E P R I O R I T I Z AT I O N

Partition & Isolate (data or services)

cluster isolation:

data model isolation:

shard isolation:

region isolation:

route isolation:

Metaserver

Edgestore

File Journal

Magic Pocket

Bandaid

Isolation

top related