mysql and search at craigslist
TRANSCRIPT
MySQL and Search at Craigslist
Jeremy Zawodny
http://craigslist.org/
http://jeremy.zawodny.com/blog/
Who Am I?
Creator and co-author of High Performance MySQL
Creator of mytop
Perl Hacker
MySQL Geek
Craigslist Engineer (as of July, 2008)
MySQL, Data, Search, Perl
Ex-Yahoo (Perl, MySQL, Search, Web Services)
What is Craigslist?
What is Craigslist?
Local Classifieds
Jobs, Housing, Autos, Goods, Services
~500 cities world-wide
Free
Except for jobs in ~18 cities and brokered apartments in NYC
Over 20B pageviews/month
50M monthly users
50+ countries, multiple languages
40+M ads/month, 10+M images
What is Craigslist?
Forums
100M posts
100s of forums
Technical and other Challenges
High ad churn rate
Post half-life can be short
Growth
High traffic volume
Back-end tools and data analysis needs
Growth
Need to archive postings... forever!
100s of millions, searchable
Internationalization and UTF-8
Technical and other Challenges
Small Team
Fires take priority
Infrastructure gets creaky
Organic code and schema growth over years
Growth
Lack of abstractions
Too much embedded SQL in code
Documentation vs. Institutional Knowledge
Why do we have things configured like this?
Goals
Use Open Source
Keep infrastructure small and simple
Lower power is good!
Efficiency all around
Do more with less
Keep site easy and appraochable
Don't overload with features
People are easily confuse
Craigslist Internals Overview
Load BalancerRead Proxy ArrayWrite Proxy Array
Perl + memcached
Web Read Array
Apache 1.3 + mod_perl
Object CacheRead DB ClusterPerl + memcached
MySQL 5.0.xx
Not Included: - user db, image db - async tasks, email - accounting, internal tools - and more!Search Cluster
Sphinx
...
Vertical Partitioning: Roles
Users
Classifieds
Users
Classifieds
Forums
Stats
Archive
WriteReadLongTrash
Vertical Partitioning
Different roles have different access patterns
Sub-roles based on query type
Easier to manage and scale
Logical, self-contained data
Servers may not need to be as big/fast/expensive
Difficult to do retroactively
Various named db handles in code
Horizontal Partitioning: Hydra
cluster_01
cluster_02
cluster_03
cluster_N
...
client
Horizontal Partitioning: Hydra
Need to retrofit a lot of code
Need non-blocking Perl MySQL client
Wrapped http://code.google.com/p/perl-mysql-async/
Eventually can size DB boxes based on price/power and adjust mapping function(s)
Choose hardware first
Make the db fit
Archiving lets us age a cluster instead of migrating it's data to a new one.
Search Evolution
Problem: Users want to find stuff.
Solution: Use MySQL Full Text.
...time passes...
Problem: MySQL Full Text Doesn't Scale!
Solution: Use Sphinx.
...time passes...
Problem: Sphinx doesn't scale!
Solution: Patch Sphinx.
MySQL Full-Text Problems
Hitting invisible limits
CPU not pegged, Memory available
Disk I/O not unreasonable
Locking / Mutex contention? Probably.
MyISAM has occasional crashing / corruption
5 clusters of 5 machines
Partitioning based on city and category
All hand balanced and high-maintenance
~30M queries/day
Close to limits
Sphinx: My First CL Project
Sphinx is designed for text search
Fast and lean C++ code
Forking model scales well on multi-core
Control over indexing, weighting, etc.
Also spent some time looking at Apache Solr
Search Implementation Details
Partitioning based on cities (each has a numeric id)
Attributes vs. Keywords
Persistent Connections
Custom client and server modifications
Minimal stopword List
Partition into 2 clusters (1 master, 4 slaves)
Sphinx Incremental Indexing
Re-index every N minutes
Use main + delta strategy
Adopted as: index + today + delta
One set per city (~500 * 3)
Slaves handle live queries, update via rsync
Need lots of FDs
Use all 4 cores to index
Every night, perform daily merge
Generate config files via Perl
Sphinx Incremental Indexing
Sphinx Issues
Merge bugs [fixed]
File descriptor corruption [fixed]
Persistent connections [fixed]
Overhead of fork() was substantial in our testing
200 queries/sec vs. 1,000 queries/sec per box
Missing attribute updates [unreported]
Bogus docids in responses
We need to upgrade to latest Sphinx soon
Andrew and team have been excellent!
Search Project Results
From 25 MySQL Boxes to 10 Sphinx
Lots more headroom!
New Features
Nearby Search
No seizing or locking issues
1,000+ qps during peak w/room to grow
50M queries per day w/steady growth
Cluster partitioning built but not needed (yet?)
Better separation of code
Sphinx Wishlist
Efficient delete handling (kill lists)
Non-fatal missing indexes
Index dump tool
Live document add/change/delete
Built-in replication
Stats and counters
Text attributes
Protocol checksum
Data Archiving, Replication, Indexes
Problem: We want to keep everything.
Solution: Archive to an archive cluster.
Problem: Archiving is too painful. Index updates are expensive! Slaves affected.
Solution: Archive with home-grown eventually consistent replication.
Data Archiving: OOB Replication
Eventual Consistency
Master process
SET SQL_LOG_BIN=0
Select expired IDs
Export records from live master
Import records into archive master
Delete expired from live master
Add IDs to list
Data Archiving: OOB Replication
Slave process
One per MySQL slave
Throttled to minimize impact
State kept on slave
Clone friendly
Simple logic
Select expired IDs added since my sequence number
Delete expired records
Update local last seen sequence number
Long Term Data Archiving
Schema coupling is bad
ALTER TABLE takes forever
Lots of NULLs flying around
CouchDB or similar long-term?
Schema-free feels like a good fit
Tested some home grown solutions already
Separate storage and indexing?
Indexing with Sphinx?
Drizzle, XtraDB, Future Stuff
CouchDB looks very interesting. Maybe for archive?
XtraDB / InnoDB plugin
Better concurrency
Better tuning of InnoDB internals
libdrizzle + Perl
DBI/DBD may not fit an async model well
Can talk to both MySQL and Drizzle!
Oracle buying Sun?!?!
We're Hiring!
Work in San Francisco
Flexible, Small Company
Excellent Benefits
Help Millions of People Every Week
We Need Perl/MySQL Hackers
Come Help us Scale and Grow
Questions?