scaling dropbox - qconf sf 11/08/2016 - qconsf.com
Post on 13-Feb-2017
237 Views
Preview:
TRANSCRIPT
Scaling DropboxP R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6
Fear of the unknown
M E M O R Y L E A K
S Y N C H O R N I Z AT I O N E V E N T
Success story
T O D AY ’ S TA L K
• 2012
• SCALING CHALLENGES
• 2016
• Q&A
P R E S L AV L E
• At Dropbox since 2013
• Projects: Magic Pocket, Infrastructure Performance, Traffic team
F I L E , S Y N C & S H A R E
5 0 0 M I L L I O N U S E R S
2 0 1 2
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters
B L O C K D ATA I N S 3
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters AWS
M E TA D ATA I N M Y S Q L
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacentersDropbox’s datacenters
1 . F E T C H M E TA D ATA
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters
metaserver
DB
LB
clients
Memcached
2 . D O W N L O A D B L O C K S
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters
blockserver
S3
LBLB
clients
3 . WA I T F O R N OT I F I C AT I O N S
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters
notification server
clients
metaserver
P Y T H O N E V E R Y W H E R E
KMOD ON SCALE
metaservermetaserver
metaserverblockserver
blockserverblockserver
S3DBDB
DBMemcached
MemcachedMemcached
nginxnginx
LB
notification server
clients
nginxnginx
LB
async processingasync
processingasync processing
AWSDropbox’s datacenters
Dropbox’s datacenters
meta-clientmeta-client
meta-clientmeta-client
meta-clientmeta-web
meta-apimeta-api
meta-apimeta-mobile
meta-mobilemeta-mobile
C L U S T E R I SOLAT ION
Scaling Databases Scaling as Organization
Scaling Software Managing Complexity
S C A L I N G D ATA B A S E S
mysqlmaster
mysqlreplica
mysqlreplica
metaserverMemcachedMemcachedMemcached
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
H O R I Z O N TA L S C A L I N G
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
C O N N E C T I O N S
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
S Q L P R O X Y
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaservermetaserver metaserver
SQL Proxy SQL Proxy SQL Proxy
Scaling as Organization
Scaling Software Managing Complexity
Scaling Databases
G L O B A L D ATA B A S E
AVA I L A B I L I T Y I S S U E S
P L AY B O O K
1. Check for ongoing deployments or newly enabled features
P L AY B O O K
1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs
1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs 3. DBA oncall, please help!
P L AY B O O K
Dropbox grew from 100 to 500 employees
• Slow queries would adversely impact performance across the board
• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL
• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity
• Slow queries would adversely impact performance across the board • More features => Managing more independent MySQL • Reactively (re)sharding individual databases as they hit capacity • Impacted developer productivity
S C A L A B L E M E TA D ATA S T O R E D E S I G N E D F O R M U L T I -T E N A N C Y
KMOD ON SCALE
2013 — Present
S H A R D I N G A N D C A C H I N G B E H I N D T H E S C E N E S
KMOD ON SCALE
E N T I T I E S A N D A S S O C I AT I O N S
KMOD ON SCALE
F I R S T G O S E R V I C E
Scaling Software
Scaling as Organization
Managing Complexity
Scaling Databases
P E R F E C T S T O R M
S H A R D I N G
P H O T O A L B U M S
T E A M A D M I N C O N S O L E
R E Q U E S T F A N O U T
request
Colocation ID Counter
8 bytes 8 bytes
G L O B A L I D
• Colocation ID: Identifies a shard • Counter: Unique ID within the shard
Lack of colocation also hurts performance
N E W S E R V I C E : F I L E J O U R N A L
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
S H A R D F A I L U R E
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1master
S H A R D I N G ( PA R T I I )
L O N G T I M E O U T S
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1master
R U N O U T O F W O R K E R S
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1master
File JournalFile Journal File Journal
C A S C A D I N G F A I L U R E
shard1master
shard1replica
shard1replica
shard0master
shard0replica
shard0replica
shardNmaster
shardNreplica
shrardNreplica
…
……metaserver metaserver metaserver metaserver
File Journal File Journal File Journal…
metaserver metaserver
shard1master
File JournalFile Journal File Journal
metaserver metaserver metaserver metaservermetaserver metaserver
Limit resources dedicated to processing a single shard
S H A R D I SOLAT ION
Managing Complexity
Scaling as Organization
Scaling Software
Scaling Databases
500PB+ user block data
3+ geographic regions
500+ million users
M A G I C P O C K E T B L O C K S T O R A G E S Y S T E M
Zone (west)
Zone (east)
Zone (central)
put
put putget get
get
complicated!
☹
☺simple
complicated!
☹
complicated!
☹complicated!
☹
complicated!
☹complicated!
☹
P Y T H O N , G O & R U S T
https://blogs.dropbox.com/tech/
2 0 1 6
meta-clientmeta-client
meta-clientmeta-client
meta-clientmeta-web
meta-apimeta-api
meta-apimeta-mobile
meta-mobilemeta-mobile
File JournalFile Journal
File JournalSearch
SearchSearch
AuthAuthAuthservice
Block RoutingBlock
RoutingBlock Routing
AuthAuth
Edgestore
AuthAuthPresence
&Notications
File JournalFile Journal
Cape…
blockserverblockserver
blockserver
Magic PocketMagic
PocketMagic Pocket
Blockservice
RivieraRivieraThumbnail
service
H O W T O P R E V E N T C A S C A D I N G F A I L U R E ?
meta-clientmeta-client
meta-clientmeta-client
meta-clientmeta-web
meta-apimeta-api
meta-apimeta-mobile
meta-mobilemeta-mobile
File JournalFile Journal
File JournalSearch
SearchSearch
AuthAuthAuthservice
Block RoutingBlock
RoutingBlock Routing
AuthAuth
Edgestore
AuthAuthPresence
&Notications
File JournalFile Journal
Cape…
blockserverblockserver
blockserver
Magic PocketMagic
PocketMagic Pocket
Blockservice
RivieraRivieraThumbnail
service
Search
B A N D A I D : P E R R O U T E I SOLAT ION
Q U E U E P R I O R I T I Z AT I O N
Partition & Isolate (data or services)
cluster isolation:
data model isolation:
shard isolation:
region isolation:
route isolation:
Metaserver
Edgestore
File Journal
Magic Pocket
Bandaid
Isolation
top related