riak perf wins
DESCRIPTION
How the team at Clipboard.com got more than 100x better search performance with some simple changes to riak search.TRANSCRIPT
![Page 1: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/1.jpg)
Riak Search Performance Wins
How we got > 100x improvement in query throughput
Gary Flake, [email protected]
![Page 2: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/2.jpg)
Demo
Introduction
![Page 3: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/3.jpg)
Architecture
riak-01
riak-02
riak-03riak-04
riak-05
web-01Node.js + Nginx
web-02Node.js + Nginx
web-03Node.js + Nginx
cache-01
cache-02
cache-03
redis-01
redis-02
thumb-01 thumb-02 job-02
admin-01
job-01
![Page 4: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/4.jpg)
Riak
An awesome noSQL data store:
• Super easy to scale up AND down
• Fault tolerant – no SPoF
• Flexible schema
• Full-text search out of the box
• Can be fixed and improved in Erlang (the Basho folks awesomely take our commits)
![Page 5: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/5.jpg)
Riak – Basics
• Data in Riak is grouped buckets(effectively namespaces)
• Basic operations are:• Get, save, delete, search, map, reduce
• Eventual consistency managed through N, R, and W bucket parameters.
• Everything we put in Riak is JSON
• We talk to Riak through the excellent riak-js node library by Francisco Treacy
![Page 6: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/6.jpg)
Data Model – Clips
annotation
title
author
ctime
tags
domain
mentions
![Page 7: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/7.jpg)
Data Model - Clips
Clips are the gateway to all of our data
key: abc
<html>
…
</html>
Key: abc
“F1rst”
“Nice clip yo!”
“Saw this on Reddit…”Clip
Blob
Comment Cache
Comments on Clip ‘abc’
![Page 8: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/8.jpg)
Other Buckets
• Users
• Blobs
• Comments
• Templates
• Counts
• Search Caches
• Transactions
![Page 9: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/9.jpg)
Riak Search
• Gets many things out of Riak by something other than the primary key.
• You specify a schema (the types for the field within a JSON object).
• Works great but with one big gotcha:
– Index is uses term-based partitioning instead of document-based partitioning
– Implication: joins + sort + pagination sucks
– We know how to work around this
![Page 10: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/10.jpg)
Riak Search – Querying
• Query syntax based on Lucene
• Basic Query
text:funny
• Compound Query
login:greg OR (login:gary AND tags:riak)
• Range Query
ctime:[98685879630026 TO 98686484430026]
![Page 11: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/11.jpg)
Clipboard App Flow
Client node.js RiakGo to clipboard.com/home
Search clips bucket query = login:greg
Top 20 resultsTop 20 results
start rendering
(For each clip)API Request for blob
GET from blobs bucket
Return blob to client
render blob
![Page 12: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/12.jpg)
Clipboard Queries
(Search)
login:greg
mentions:greg
ctime:[98685879630026 TO 98686484430026]
![Page 13: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/13.jpg)
Clipboard Queries cont.
(Search)
login:greg AND tags:riak
login:greg AND text:node AND text:javascript
![Page 14: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/14.jpg)
Uh oh
(Search)
login:greg AND private:false
login:greg AND text:iPhone
Matches 20% of all clips!Matches only my clips
![Page 15: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/15.jpg)
Index Partitioning Schemes
![Page 16: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/16.jpg)
Doc Partition Query Processing
1. x AND y (sort z, start = 990, count = 10)
2. On Each node:
1. Perform x AND y
2. Sort on z
3. Slice [ 0 .. 1000 ]
4. Send to aggregator
3. On aggregator
1. Merge all results (N x 1000)
2. Slice [ 990 .. 1000 ]
![Page 17: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/17.jpg)
Term Partition Query Processing
1. x AND y (sort z, start = 990, count = 10)
2. On x node: search for x (and send all)
3. On y node: search for y (and send all)
4. On aggregator:
1. Do x AND y
2. Sort on z
3. Slice to [ 990 .. 1000 ]
![Page 18: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/18.jpg)
Riak Search Issues
1. For any singular term, all results must be sent back to aggregator.
2. Incorrectly performs sort and slice (does sort then slice)
3. ANDs take time O(MAX(|x|, |y|)) instead of O(MIN(|x|, |y|).
4. All matches must be read to get sort field.
![Page 19: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/19.jpg)
Riak Search Fixes
1. Inline fields for short and common attributes.
2. Dynamic fields for precomputed ANDs.
3. PRESORT option for sorting without document reads.
![Page 20: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/20.jpg)
Inline Fields
Nifty feature added recently to Riak Search
Fields only used to prune result set can be made inline for a big perf win
Normal query applied first – then results filtered quickly with inline “filter” query
High storage cost – only viable for small fields!
(Search)
![Page 21: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/21.jpg)
Riak Search – Inline Fields cont.
login:greg AND private:false
becomes
Query - login:greg Filter Query – private:false
private:false is efficiently applied only to results of login:greg. Hooray!
(Search)
![Page 22: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/22.jpg)
Fixing ANDs
But what about login:greg AND text:iPhone?
text field is too large to inline!
We had to get creative.
(Search)
![Page 23: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/23.jpg)
Dynamic Fields
Our Solution: Create a new field - text_u
(u for user)
Values in text_u have the user’s name appended
In greg’s clip
text:iPhone text_greg:iPhone
In bob’s clip
text:iPhone text_bob:iPhone
(Search)
![Page 24: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/24.jpg)
Presort on Keys
• Our addition to Riak code base.
• Does sort before slice
• If PRESORT=key, then never reads the docs
• Tremendous win (> 100x compared to M/R approaches)
![Page 25: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/25.jpg)
Clip Keys
<Time (ms)><User (guid)><SHA1 of Value>
• Base-64 encode each component
• Only use first 4 characters of user & content
• Only 16 bytes
Collisions? 1 in 17M if clipped the same thing at same time.
![Page 26: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/26.jpg)
Our Query Processing
1. w AND (x AND y)(sort z, start = 990, count = 10)
2. On w_x node: search and send w_x
3. On w_y node: search and send all w_y
4. On aggregator:
1. Do w_x AND w_y
2. Sort on z
3. Slice to [ 990 .. 1000 ]
![Page 27: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/27.jpg)
Summary
• Use inline fields for short and common bits
• Use dynamic fields for prebuilt ANDs
• Use keys that imply sort order
• Use same techniques for pagination
• Out approach yields search throughput that is 100x better than out of the box (and better as you scale outward).
![Page 28: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/28.jpg)
Questions?
![Page 29: Riak perf wins](https://reader034.vdocument.in/reader034/viewer/2022051412/54b7601b4a7959f9168b46a3/html5/thumbnails/29.jpg)
We’re hiring!
www.clipboard.com/register
Invitation Code: just4u
www.clipboard.com/jobs
Or talk to us right now!
Thanks!