migrating to riak at shareaholic
DESCRIPTION
Robby Grossman, Shareaholic's Tech Lead, spoke at the first Boston Riak Meetup on August 30, 2012. These are his slides.TRANSCRIPT
Agenda
Shareaholic: Product & Tech
Why Riak: The Search for a Big Data Store
Transitioning to Riak
Riak Use Cases
Deploying to EC2
What’s ?
Browser Tools
Sharing Buttons
Recommendations
Social Analytics
Monthly @
Thousands of developers hitting API
Hundreds of thousands of publishers
Tens of millions of shares & clicks
Hundreds of millions of pageviews & events
Tech @
JRuby on Rails (via Torquebox)
MySQL (Master, Read Slave)
Elastic MapReduce (similar to Hadoop)
Redis
Formerly Mongo, Now Riak
Why Not Mongo?
Working set needs to fit in memory
Global write lock blocks all queriesdespite not having transactions/joins
Standbys not “hot”
Why Riak?
Next @Options:
HBase
Cassandra
Riak
Goals:
Linear scalability
Full-text search
Flexible indexing
Easier Devops
HBasePros
Battle tested
High performance
Cons
Complex Architecture
SPOFs
Requires Hive for Indexing/Querying
Expensive to deployat small scale
CassandraPros
Native secondary indices
Linear scalability
Tunable CAP
Cons
Known users all domain experts
Search requires Lucene
Heavy Weight MapReduce
RiakPros
Operationally simpler
Linear scalability
Integrated search
Secondary indices
Tunable CAP
Vector clocks solve time-sync problems
Cons
Multi-data center replication requires Enterprise product
leveldb puts high strain on CPU
From Mongo to Riak
Migration Goals
No time where database goes “offline”
Product parity throughout migration
Migration Process
1. App writes to Mongo and Riak
2. Verify data integrity
3. Import historical data
4. App reads from Riak
5. Decommission Mongo
Use Cases
Share API
Save shared content
Uses MapReduce topopulate user dashboard
Recommendations
Sets of related pages
Generated on-demand
Publisher Analytics
Generated nightly via Hadoop
Typical stored “document” (JSON)
80kb-1Mb
Riak Successes
MapReduce
Handy for querying
Runs at “web page speed”.
Easy to re-reduce for complex queries
Easy to test via CURL
Replication: primary/secondary authority
Read failure tolerance: speed/consistency
Write failure tolerance
Tunable CAP @
Full Text Search
Built on Lucene
Make user content searchable
Make arbitrary keys queryable
“Just turn it on”
Hiccup: corrupt merge indexes
Query Example
curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ]}'
Who’s our oldest user who’s shared something in the last minute?
[[2197]]
Riak on EC2
In a Nutshell
EC2 specs poorly proportioned for leveldb
Multiple AZs in one location works well
Scale vertically for better latency & consistency
Scale horizontally for more throughput/$
Benchmarks
Top Graph: c1.medium (1.7G, 5 CPU)
Middle: m1.large (7.5G, 4 CPU)
Bottom: cc1.4xlarge (23G, 33.5 CPU)
Throughput
Latency (Typical)
Latency (Worst Case)
Calculationsc1.medium (1.7G, 5 CPU)1758 IOPS/$-hrWorst 1% of queries: 300ms/800ms
m1.large (7.5G, 4 CPU)1167 IOPS/$-hrWorst 1% of queries: 110ms/200ms
cc1.4xlarge (23G, 33.5 CPU)872 IOPS/$-hrWorst 1% of queries: 47ms/139ms
Benchmark Takeaways
You can’t go “by spec”
IO is limiting factor
RAM never limiting factor for 1%of keyspace to be in memory
Fin. Questions?Thanks:
Tom Santero
Justin Sheehy
Ryan Zezeski
Reid Draper
#freenode riak crew
We’re Hiring!
Robby Grossman
@freerobby
Fin.