scaling instagram

185
Scaling Instagram AirBnB Tech Talk 2012 Mike Krieger Instagram

Upload: iammutex

Post on 16-May-2015

82.890 views

Category:

Technology


0 download

DESCRIPTION

Instagram 扩展性实践

TRANSCRIPT

Page 1: Scaling Instagram

Scaling InstagramAirBnB Tech Talk 2012

Mike KriegerInstagram

Page 2: Scaling Instagram

me

- Co-founder, Instagram

- Previously: UX & Front-end@ Meebo

- Stanford HCI BS/MS

- @mikeyk on everything

Page 3: Scaling Instagram
Page 4: Scaling Instagram
Page 5: Scaling Instagram
Page 6: Scaling Instagram

communicating and sharing in the real world

Page 7: Scaling Instagram

30+ million users in less than 2 years

Page 8: Scaling Instagram

the story of how we scaled it

Page 9: Scaling Instagram

a brief tangent

Page 10: Scaling Instagram

the beginning

Page 11: Scaling Instagram

Text

Page 12: Scaling Instagram

2 product guys

Page 13: Scaling Instagram

no real back-end experience

Page 14: Scaling Instagram

analytics & python @ meebo

Page 15: Scaling Instagram

CouchDB

Page 16: Scaling Instagram

CrimeDesk SF

Page 17: Scaling Instagram
Page 18: Scaling Instagram

let’s get hacking

Page 19: Scaling Instagram

good components in place early on

Page 20: Scaling Instagram

...but were hosted on a single machine

somewhere in LA

Page 21: Scaling Instagram
Page 22: Scaling Instagram

less powerful than my MacBook Pro

Page 23: Scaling Instagram

okay, we launched.now what?

Page 24: Scaling Instagram

25k signups in the first day

Page 25: Scaling Instagram

everything is on fire!

Page 26: Scaling Instagram

best & worst day of our lives so far

Page 27: Scaling Instagram

load was through the roof

Page 28: Scaling Instagram

first culprit?

Page 29: Scaling Instagram
Page 30: Scaling Instagram

favicon.ico

Page 31: Scaling Instagram

404-ing on Django, causing tons of errors

Page 32: Scaling Instagram

lesson #1: don’t forget your favicon

Page 33: Scaling Instagram

real lesson #1: most of your initial scaling

problems won’t be glamorous

Page 34: Scaling Instagram

favicon

Page 35: Scaling Instagram

ulimit -n

Page 36: Scaling Instagram

memcached -t 4

Page 37: Scaling Instagram

prefork/postfork

Page 38: Scaling Instagram

friday rolls around

Page 39: Scaling Instagram

not slowing down

Page 40: Scaling Instagram

let’s move to EC2.

Page 41: Scaling Instagram
Page 42: Scaling Instagram
Page 43: Scaling Instagram

scaling = replacing all components of a car

while driving it at 100mph

Page 44: Scaling Instagram

since...

Page 45: Scaling Instagram

“"canonical [architecture] of an early stage startup

in this era."(HighScalability.com)

Page 46: Scaling Instagram

Nginx & Redis & Postgres & Django.

Page 47: Scaling Instagram

Nginx & HAProxy & Redis & Memcached &Postgres & Gearman &Django.

Page 48: Scaling Instagram

24h Ops

Page 49: Scaling Instagram
Page 50: Scaling Instagram
Page 51: Scaling Instagram

our philosophy

Page 52: Scaling Instagram

1 simplicity

Page 53: Scaling Instagram

2 optimize for minimal operational

burden

Page 54: Scaling Instagram

3 instrument everything

Page 55: Scaling Instagram

walkthrough:1 scaling the database2 choosing technology3 staying nimble4 scaling for android

Page 56: Scaling Instagram

1 scaling the db

Page 57: Scaling Instagram

early days

Page 58: Scaling Instagram

django ORM, postgresql

Page 59: Scaling Instagram

why pg? postgis.

Page 60: Scaling Instagram

moved db to its own machine

Page 61: Scaling Instagram

but photos kept growing and growing...

Page 62: Scaling Instagram

...and only 68GB of RAM on biggest machine in EC2

Page 63: Scaling Instagram

so what now?

Page 64: Scaling Instagram

vertical partitioning

Page 65: Scaling Instagram

django db routers make it pretty easy

Page 66: Scaling Instagram

def db_for_read(self, model): if app_label == 'photos': return 'photodb'

Page 67: Scaling Instagram

...once you untangle all your foreign key

relationships

Page 68: Scaling Instagram

a few months later...

Page 69: Scaling Instagram

photosdb > 60GB

Page 70: Scaling Instagram

what now?

Page 71: Scaling Instagram

horizontal partitioning!

Page 72: Scaling Instagram

aka: sharding

Page 73: Scaling Instagram

“surely we’ll have hired someone experienced before we actually need

to shard”

Page 74: Scaling Instagram

you don’t get to choose when scaling challenges

come up

Page 75: Scaling Instagram

evaluated solutions

Page 76: Scaling Instagram

at the time, none were up to task of being our

primary DB

Page 77: Scaling Instagram

did in Postgres itself

Page 78: Scaling Instagram

what’s painful about sharding?

Page 79: Scaling Instagram

1 data retrieval

Page 80: Scaling Instagram

hard to know what your primary access patterns will be w/out any usage

Page 81: Scaling Instagram

in most cases, user ID

Page 82: Scaling Instagram

2 what happens if one of your shards

gets too big?

Page 83: Scaling Instagram

in range-based schemes (like MongoDB), you split

Page 84: Scaling Instagram

A-H: shard0I-Z: shard1

Page 85: Scaling Instagram

A-D: shard0E-H: shard2I-P: shard1Q-Z: shard2

Page 86: Scaling Instagram

downsides (especially on EC2): disk IO

Page 87: Scaling Instagram

instead, we pre-split

Page 88: Scaling Instagram

many many many (thousands) of logical

shards

Page 89: Scaling Instagram

that map to fewer physical ones

Page 90: Scaling Instagram

// 8 logical shards on 2 machines

user_id % 8 = logical shard

logical shards -> physical shard map

{ 0: A, 1: A, 2: A, 3: A, 4: B, 5: B, 6: B, 7: B}

Page 91: Scaling Instagram

// 8 logical shards on 2 4 machines

user_id % 8 = logical shard

logical shards -> physical shard map

{ 0: A, 1: A, 2: C, 3: C, 4: B, 5: B, 6: D, 7: D}

Page 92: Scaling Instagram

little known but awesome PG feature: schemas

Page 93: Scaling Instagram

not “columns” schema

Page 94: Scaling Instagram

- database: - schema: - table: - columns

Page 95: Scaling Instagram

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Page 96: Scaling Instagram

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

machineA’: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Page 97: Scaling Instagram

machineA: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

machineC: shard0 photos_by_user shard1 photos_by_user shard2 photos_by_user shard3 photos_by_user

Page 98: Scaling Instagram

can do this as long as you have more logical shards than physical

ones

Page 99: Scaling Instagram

lesson: take tech/tools you know and try first to adapt them into a simple

solution

Page 100: Scaling Instagram

2 which tools where?

Page 101: Scaling Instagram

where to cache / otherwise denormalize

data

Page 102: Scaling Instagram

we <3 redis

Page 103: Scaling Instagram

what happens when a user posts a photo?

Page 104: Scaling Instagram

1 user uploads photo with (optional) caption

and location

Page 105: Scaling Instagram

2 synchronous write to the media database for

that user

Page 106: Scaling Instagram

3 queues!

Page 107: Scaling Instagram

3a if geotagged, async worker POSTs to Solr

Page 108: Scaling Instagram

3b follower delivery

Page 109: Scaling Instagram

can’t have every user who loads her timeline

look up all their followers and then their photos

Page 110: Scaling Instagram

instead, everyone gets their own list in Redis

Page 111: Scaling Instagram

media ID is pushed onto a list for every person

who’s following this user

Page 112: Scaling Instagram

Redis is awesome for this; rapid insert, rapid

subsets

Page 113: Scaling Instagram

when time to render a feed, we take small # of IDs, go look up info in

memcached

Page 114: Scaling Instagram

Redis is great for...

Page 115: Scaling Instagram

data structures that are relatively bounded

Page 116: Scaling Instagram

(don’t tie yourself to a solution where your in-

memory DB is your main data store)

Page 117: Scaling Instagram

caching complex objects where you want to more

than GET

Page 118: Scaling Instagram

ex: counting, sub-ranges, testing membership

Page 119: Scaling Instagram

especially when Taylor Swift posts live from the

CMAs

Page 120: Scaling Instagram

follow graph

Page 121: Scaling Instagram

v1: simple DB table(source_id, target_id,

status)

Page 122: Scaling Instagram

who do I follow?who follows me?

do I follow X?does X follow me?

Page 123: Scaling Instagram

DB was busy, so we started storing parallel

version in Redis

Page 124: Scaling Instagram

follow_all(300 item list)

Page 125: Scaling Instagram

inconsistency

Page 126: Scaling Instagram

extra logic

Page 127: Scaling Instagram

so much extra logic

Page 128: Scaling Instagram

exposing your support team to the idea of cache invalidation

Page 129: Scaling Instagram
Page 130: Scaling Instagram

redesign took a page from twitter’s book

Page 131: Scaling Instagram

PG can handle tens of thousands of requests, very light memcached

caching

Page 132: Scaling Instagram

two takeaways

Page 133: Scaling Instagram

1 have a versatile complement to your core data storage (like Redis)

Page 134: Scaling Instagram

2 try not to have two tools trying to do the

same job

Page 135: Scaling Instagram

3 staying nimble

Page 136: Scaling Instagram

2010: 2 engineers

Page 137: Scaling Instagram

2011: 3 engineers

Page 138: Scaling Instagram

2012: 5 engineers

Page 139: Scaling Instagram

scarcity -> focus

Page 140: Scaling Instagram

engineer solutions that you’re not constantly returning to because

they broke

Page 141: Scaling Instagram

1 extensive unit-tests and functional tests

Page 142: Scaling Instagram

2 keep it DRY

Page 143: Scaling Instagram

3 loose coupling using notifications / signals

Page 144: Scaling Instagram

4 do most of our work in Python, drop to C when

necessary

Page 145: Scaling Instagram

5 frequent code reviews, pull requests to keep things in the ‘shared

brain’

Page 146: Scaling Instagram

6 extensive monitoring

Page 147: Scaling Instagram

munin

Page 148: Scaling Instagram

statsd

Page 149: Scaling Instagram
Page 150: Scaling Instagram

“how is the system right now?”

Page 151: Scaling Instagram

“how does this compare to historical trends?”

Page 152: Scaling Instagram

scaling for android

Page 153: Scaling Instagram

1 million new users in 12 hours

Page 154: Scaling Instagram

great tools that enable easy read scalability

Page 155: Scaling Instagram

redis: slaveof <host> <port>

Page 156: Scaling Instagram

our Redis framework assumes 0+ readslaves

Page 157: Scaling Instagram

tight iteration loops

Page 158: Scaling Instagram

statsd & pgfouine

Page 159: Scaling Instagram

know where you can shed load if needed

Page 160: Scaling Instagram

(e.g. shorter feeds)

Page 161: Scaling Instagram

if you’re tempted to reinvent the wheel...

Page 162: Scaling Instagram

don’t.

Page 163: Scaling Instagram

“our app servers sometimes kernel panic

under load”

Page 164: Scaling Instagram

...

Page 165: Scaling Instagram

“what if we write a monitoring daemon...”

Page 166: Scaling Instagram

wait! this is exactly what HAProxy is great at

Page 167: Scaling Instagram

surround yourself with awesome advisors

Page 168: Scaling Instagram

culture of openness around engineering

Page 169: Scaling Instagram

give back; e.g. node2dm

Page 170: Scaling Instagram

focus on making what you have better

Page 171: Scaling Instagram

“fast, beautiful photo sharing”

Page 172: Scaling Instagram

“can we make all of our requests 50% the time?”

Page 173: Scaling Instagram

staying nimble = remind yourself of what’s

important

Page 174: Scaling Instagram

your users around the world don’t care that you

wrote your own DB

Page 175: Scaling Instagram

wrapping up

Page 176: Scaling Instagram

unprecedented times

Page 177: Scaling Instagram

2 backend engineers can scale a system to

30+ million users

Page 178: Scaling Instagram

key word = simplicity

Page 179: Scaling Instagram

cleanest solution with the fewest moving parts as

possible

Page 180: Scaling Instagram

don’t over-optimize or expect to know ahead of time how site will scale

Page 181: Scaling Instagram

don’t think “someone else will join & take care

of this”

Page 182: Scaling Instagram

will happen sooner than you think; surround yourself with great

advisors

Page 183: Scaling Instagram

when adding software to stack: only if you have to, optimizing for operational

simplicity

Page 184: Scaling Instagram

few, if any, unsolvable scaling challenges for a

social startup

Page 185: Scaling Instagram

have fun