myYearbook.com Architecture
Lessons Learned from the Trials of Scaling a High Traffic Website
• Founded in 2005 • 3rd Largest Social Network in United States
• Teenage Demographic • 60+ Employees
January 2007 • 100M Pageviews • 1 Database Server • 1 Web ApplicaOon Server
• Daily issues with load and site availability
September 2008 • 2.5B Pageviews • 30 Database Servers • 120 Web ApplicaOon Servers
• 99.94% UpOme as measured by pingdom.com
Key Architecture Components • PHP5, APC • Apache hYpd • PostgreSQL • Memcached
• Apache AcOveMQ
• LighYpd • Isilon IQ Clustered NAS • Message Systems eCelerity
• Subversion
Web ApplicaOon Architecture • 2005‐2007: Monolithic Code Base • 2008: MigraOng to a Services Oriented Architecture – ApplicaOons get own resources – Loosely Coupled architecture
• MVC ApplicaOon using XSLT
Web ApplicaOon Architecture • Why SOA?
– Monolithic app wastes hardware
– Cross Data‐Center OperaOons
– SelecOve Maintenance
Scaling Postgres Rules for Scaling
1. Plan for Growth 2. Know the internals 3. Bigger Hardware is
BeYer
Our Postgres Scaling History • Quarter 1, 2007 – Monolithic database with one schema, many complex joins and poor opOmizaOon
– No plan for growth – No DBA
Our Postgres Scaling History • Quarter 3, 2008 – Horizontal “Sharded” Data – VerOcal ParOOoning – 5000 ConnecOons/sec Avg
Scaling Postgres: Lessons Learned • Scaling web servers means many database connecOons, needed pooling – Started with pgPool moved to pgBouncer
• Started with Slony replicaOng read‐only slaves – High IO/CPU Overhead
Scaling Postgres: Lessons Learned • Began scaling verOcally by separaOng applicaOon data by database servers and removed read only slaves
• Needed few small tables replicated that could be slightly inaccurate and eventually consistent (BASE)
Scaling Postgres: Lessons Learned • Enter plProxy – Database parOOoning language by Skype uOlizing PostgreSQL funcOons
– Trigger based plProxy funcOons replicate needed tables without the Queue overhead
– NOT TRANSACTION SAFE
Scaling Postgres: Lessons Learned • Standard Use of plProxy – Horizontal parOOoning of data by ID across mulOple servers
– Example: Messaging System • 8 Servers store actual parOOoned message data • Rule #1 – Plan for Growth
Scaling Postgres: Lessons Learned • Knowing internals – pg_catalog
• pg_stat_user_tables • pg_stat_user_indexes
Scaling Postgres: Knowing Internals
Scaling Postgres: Lessons Learned • Database Ecosystem
– Performance Factors • Index bloat • Usage changes
– Abuse • Cache uOlizaOon contenOon
Scaling Postgres: Lessons Learned • Bigger is BeYer
– More RAM – More Disks
– Faster and More CPU
Scaling Postgres: Lessons Learned Scaling Across CPU Cores
• PostgreSQL Scales to 32 Cores
• Extensive Benchmarking @ MYB
Before and A=er Upgade
Scaling Postgres: Future Plans • More ParOOoning
• SOA Data DistribuOon – Golconde
• Python Based • Apache AcOveMQ
Apache AcOveMQ • Java based Message Broker soqware
• Client language neutral • Implements JMS 1.1, Stomp, XMPP, REST and Others
AcOveMQ @ myYearbook.com Out‐of‐band Processing • Uploaded content processing
– Image Resize – Content analysis (R&D) – AnO‐Virus Scans
• Comment and Message processing – Spam Processing
• Email spooling from web applicaOon
• Anywhere we can that makes sense
Targeted Workload • Message Queues allow for the
right server for the job • BeYer distribuOon of CPU
intensive tasks without negaOvely impacOng the user experience
• Clusterable, Scalable
Memcached: Key for Success • Valuable Scaling Tool – Over 250k get requests second during peak – Over 750GB of cached data – Easy to Deploy – The more distributed the cache becomes the less impacOng cache failures become ‐ more boxes are beYer than fewer
Memcached: PotenOal Problems • Large scale implementaOons can have some hidden problems – Lots of network traffic – Non‐parOOon or evenly distributed data
• What to do for data that is not evenly distributed? – Implemented a round‐robin cluster of memcache servers that contain the same data
Research and Development • Copyr
– Copy‐on‐Write Filesystem ReplicaOon • Framewerk
– PHP5 OO Development Framework • Golconde
– Queue Based Data DistribuOon for PostgreSQL • Lightr
– PHP5 XMPP Class Library • mod_xsltd
– LighYpd XSL TransformaOon module • Playr
– PostgreSQL Log Replay • Staplr
– STAOsical Package Logically engineered Right
Tools for Success • OperaOons Portal – ExecuOve Level Overview of OperaOonal Status and ProducOon Change Log
• Staplr – Trending & AnalyOs System
OperaOons Portal
Trending and Analysis: Staplr • Version 0.6
– PHP Based – Process forking – Shelled RRD Commands
• Version 2.0 – Python Based – Threaded – Python wrappers to librrd
Trending and Analysis: Staplr • Polls for:
– Apache hYpd – Apache AcOveMQ – lighYpd – memcached – MySQL – pgBouncer – PostgreSQL – SNMP Data
• APC, Isilon, F5, Xiotech, Others
– SysStat
QuesOons?