super sizing youtube with python
DESCRIPTION
by Mike Solomon. See more scalability tales at: http://rapd.wordpress.comTRANSCRIPT
Super-sizing YouTube with Python
Mike [email protected]
this is about scaling a web application
there are a lot of things left out - mostly mistakes and implementation details
this may generate more questions than it answers
my goal is to give you ideas for solving your own problems
Welcome
this is the core of scalability
systems change over time, so will your architecture
impossible to predict the optimal approach
start simple
aim for local maxima
python enables flexibility
Architecture
web boxes do everything
servlets, images, thumbnails, search
shoehorn everything into Apache, MySQL
very simple
this survives longer than you'd think
YouTube's Early Days
Early Web Stackcirca January ‘06
hw load balancer
db master
db replicas
mod_python
httpd
biz logic
servlets
templates
db objects thumbnailssearch
really small team
we ♥ python
logical separation in code
discipline and honor - not linguistically enforced (don’t waste time writing code to restrict people)*
grown by systematically removing bottlenecks
easy to know when something is a `win`
Early Key Factors in Engineering
user demand can grow 50% in a day
removing one bottleneck can immediately reveal another (usually more heinous)
replace and migrate components as they become problems
good (python) components make this easy
obviously, pick your battles
Running Without Tripping
minimize dependencies*
accept some latency
localize failures - don’t let them spread
you are only down if it looks like you are
applies to both systems and software
Good Components (Hypothetical)
more efficient resource utilization via specialized deployment
balance based on CPU, RAM, network and disk usage patterns
overlay orthogonal loads
disjoint tasks running on the same physical hardware
Balance Machine Resources
move from mod_python to mod_fastcgi
move thumbnails to their own machines
make search to a remote service running on separate machines
run transcoder processes on video servers
do more with the same hardware
Migratory Patterns of the Norwegian Blue
if you have a relational database, it will be abused
difficult to track the true source
series of object proxies for DB-API enable logging
encode a portion of call stack as a query comment* (more about this later)
SQL Shenanigans
take pressure off of relational db
can save additional resources if your objects require significant computation to set up
memcached makes a good home for this
need good client to make this into a truly useful service ‡
pools and better failure handling
Object Caching
fast vs fast enough
strive for machine efficiency - don't obsess
be scientific - collect data and understand it
can yield some surprising results
don't assume code optimization techniques from another language are relevant
just like carpentry, measure twice cut once
Software Optimization
pure python HMAC was 40% of web cpu
write a few lines of C
threaded comments fiasco
overly complex algorithm to compute the display object tree
simplify query, simplify algorithm
Python Optimization
psyco - specializing compiler for Python
'hot' functions are psyco-ized
there is a 'context switch' penalty so you need to experiment to see if it helps
previous threaded comments algorithm
-closure +psyco = 400% boost
Python Optimization
pruned all the obvious leaf services
dynamic web requests are one `service`
web service is easy to scale, so it stresses out other resources - probably a DB
DB’s are hard(er) to scale
tricks of escalating cleverness‡
eventually, no cards left to play
Reasonable Efficiency
pretty much have to go horizontal
choose your partition plan carefully
understand your data access patterns
what queries do you run most often?
do you have joins?
do you need transactional consistency? why?
does an 'entity' emerge?
Scaling MySQL
entities are 'transactional'
allow joins across properties of an entity
entities are migratory
cross entity is more complicated
weaken guarantees to make it easier
minimize activity by design
Partition By Entity
connection and transaction management
lookup service
query factory
minimalist table abstraction
ORM can be (is?) evil
make common behaviors simple, while leaving some transparency to the actual database
EMD, a TLA not an ORM!
apply this fundamental change to a large and growing site
make it relatively painless with python
multiple inheritance
decorators
AST plugins for validation and testing
Seismic Retrofit
all the scale-aware code nicely opaque to application developers
base use cases are painless User.select_by_username(db_context, username)
Video.select_by_id(db_context, video_id)
Video.select_by_user_id(db_context, user_id)
Resulting API
hijack mysql replication to partition on the fly while the live site is running
all DML gets tagged with an entity id
read master binlog and selectively replay it into a set of new mini-masters
update lookup service to point to new resources
Bulk Entity Migration
Recurring Themes
the elegance of simplicity
take reliable open software and customize it
`pythonic veneer`
DIY - filing a ticket for a bugfix doesn’t give me a warm feeling - take matters into your own hands*