![Page 1: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/1.jpg)
Failing Fast with Redis backed BloomFilters• Christopher Curtin
• Head of Technical Research
• @ChrisCurtin
![Page 2: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/2.jpg)
About Me 25+ years in technology
Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
Built a SaaS platform before the term ‘SaaS’ was being used
Prior to Silverpop: real-time control systems, factory automation and warehouse management
Always looking for technologies and algorithms to help with our challenges
![Page 3: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/3.jpg)
Silverpop Open Positions Technical Lead
Senior Engineer
Architect
Automation Engineers
![Page 4: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/4.jpg)
Agenda Redis
Bloom Filters
Failing Fast
![Page 5: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/5.jpg)
Agenda Redis
What it is Why we started looking at using it Basics Concurrency Operational Considerations Challenges
![Page 6: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/6.jpg)
Redis – What is it?From redis.io:
"Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs."
![Page 7: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/7.jpg)
Hyper-what-what?HyperLogLog
Approximation technique for counting distinct entries in a set.
Very small memory footprint for rough approximations (16 kb for 99% accuracy)
Nice – but too much loss for what we need
![Page 8: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/8.jpg)
Features• Unlike typical key-value stores, you can send commands to edit the value on the server vs. reading back to the client, updating and pushing to the server
• pub/sub
•TTL on keys
•Clustering and automatic fail-over
•Lua scripting
•client libraries for just about any language you can think of
![Page 9: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/9.jpg)
So Why did we start looking at NoSQL?“For the cost of an Oracle Enterprise license I can give
you 64 cores and 3 TB of memory”
![Page 10: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/10.jpg)
![Page 11: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/11.jpg)
Redis Basics In Memory-only key-value store Single Threaded. Yes, Single Threaded No Paging, no reading from disk CS 101 data structures and operations 10's of millions of keys isn't a big deal How much RAM defines how big the store can get
![Page 12: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/12.jpg)
Basic DataTypes String
Hashes
Lists
Sets and Sorted Sets
CS 101 ...
![Page 13: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/13.jpg)
HashesHashes
- collection of key-value pairs with a single name
- useful for storing data under a common name
- values can only be strings or numeric. No hash of lists
http://redis.io/commands/hget
![Page 14: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/14.jpg)
Sets and Sorted Sets Buckets of values with very fast membership look-up
No duplicates allowed
Sorted Sets have scores to make them sortable
– Automatically keeps them in order for fast 'top x' look ups
http://redis.io/commands/zadd
http://redis.io/commands/zrange
![Page 15: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/15.jpg)
Lists Most interesting due to how operations are applied to
the remote store
Unbounded (except by memory)
Atomic operations between lists (pop from one, push to another)
CS 101: lpush, rpush, lpop, range etc.
Advanced: blocking pops
Http://redis.io/commands/rpush
http://redis.io/commands/rpoplpush
![Page 16: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/16.jpg)
Concurrency Single threaded
Each operation can work on one or two keys, atomically
Pipelines allow execution of commands in sequence in a single server request (Redis will only execute the pipeline)
Pipelines do not allow for logic between commands
LUA Scripts allow for logic between commands
BE CAREFUL with LUA, scripts block all clients!
![Page 17: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/17.jpg)
Pipeline Java Example BloomFilterRedis.java line 43
![Page 18: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/18.jpg)
Lua Example Lua-scripts example
![Page 19: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/19.jpg)
Operational Information Persistence can be 'none', journal (AOF) or point in
time (RDB)
Optional Master/Slave replication
Home-grown HA platform (Sentinel)
Common deployment model is lots of instances per machine
Millions of keys gets hard to manage – build 'directory' hashes to make it easier for operations to find keys to look at
![Page 20: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/20.jpg)
Challenges with Redis Key Explosion – single name space
LUA scripts can block all others users
Pipelines can block all other users
No nested data types (I want a hash of lists!)
Without name spaces be cautious of how you define key names
![Page 21: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/21.jpg)
Concurrency Demo – JMS replacement Client submits a request to the queue (LPUSH)
Consumer application polls for work when worker is available (RPOPLPUSH)
Worker executes the task assigned to it
When worker is done, its list is removed
Lather, Rinse, Repeat
(We provide a hash of workers for Operations to query for monitoring)
![Page 22: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/22.jpg)
Agenda Bloomfilters
What they are Why we started looking at using them Basics False Positives Example Uses Why not do this in a database?
![Page 23: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/23.jpg)
Bloom FiltersFrom WikiPedia (Don't tell my kid's teacher!)
"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate"
![Page 24: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/24.jpg)
Hashing Apply 'x' hash functions to the key to be
stored/queried
Each function returns a bit to set in the bitset
Mathematical equations to determine how big to make the bitset, how many functions to use and your acceptable error level
http://hur.st/bloomfilter?n=4&p=1.0E-20
![Page 25: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/25.jpg)
Example
![Page 26: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/26.jpg)
False Positives Perfect hash functions aren't worth the cost to
develop
Sometimes existing bits for a key are set by many other keys
Make sure you understand the business impact of a false positive
Remember, never a false negative
![Page 27: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/27.jpg)
Creation Libraries are available for every language I looked up
(even JavaScript)
Some are built in memory, for a single process/JVM to use
Read-only (ad networks) are built using Hadoop and loaded into memory
In memory is great for lots of reads, single process/JVM etc.
But ...
![Page 28: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/28.jpg)
Updates Updating a 16 MB structure in memory and persisting
to disk is expensive
8 bits change and you write 16 MB!!!!!! (DBAs will love you …)
![Page 29: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/29.jpg)
Deletes Not possible in a regular Bloom Filter – how would you
know what bits are used by other keys?
Counting BloomFilters keep a few bits (3-4) per bit in the bitmap as a counter. 'delete' decrements the key
Not as space friendly any more …
Instead, consider having bloom filters based around the lifetime of the data to be queried
– For a filter 'visited in the last 4 hours' have 4 filters and age the oldest out (TTL in Redis maybe ...)
![Page 30: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/30.jpg)
Issue: Persistence Load a 16 MB filter from database to check 6 bits?
Worse: update 6 bits in a 16 MB filter
DBAs will not be happy
– Undo/redo
– SGA misses, page faults
– Backups, replication traffic etc.
![Page 31: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/31.jpg)
Why were we interested in Bloom Filters? Found a lot of places we went to the database to find
the data didn't exist
Found lots of places where we want to know if a user DIDN'T do something
![Page 32: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/32.jpg)
Persistent Bloom Filters We needed persistent Bloom Filters for lots of user
stories
Found Orestes-BloomFilter on GitHub that used Redis as a store and enhanced it
Added population filters
Fixed a few bugs
Did a pull request and it was accepted!
![Page 33: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/33.jpg)
Benefits Filters are stored in Redis
• Only bitset/bitget calls to server
Reads and updates of the filter from set of application servers
Persistence has a cost, but a fraction of the RDBMS costs
Can load a BF created offline and begin using it
![Page 34: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/34.jpg)
Remember “For the cost of an Oracle License” Thousands of filters
Dozens of Redis instances
TTL on a Redis key makes cleanup of old filters trivial
![Page 35: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/35.jpg)
Population Bloom Filters Unique need we had
Users access the system frequently, but I really only need to count them once per month for billing
10's of Thousands of clients, Finance wants monthly report in seconds
Logic is simple: if any bits weren't set for the key (user id), increment the counter
Note: there are mathematical methods of estimating a BF population but we needed better error rate
![Page 36: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/36.jpg)
Example Uses of Bloom Filters Webcache – what URLs are already in the cache on
another server?
P2P networks – what node contains which part of the file?
Databases
– Do keys exist in this page? If not, don't load the page
– Hbase uses them to detect which blocks do not have the data (HDFS is write-once)
– Many RDBMS use them internally to 'fail fast' and not load pages into memory
– Sadly, no RDBMS or NoSQL I know of offers them as user data types
![Page 37: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/37.jpg)
Example Uses of Bloom Filters Ad networks (old way ...)
– Big Hadoop job hourly/nightly to determine which ads to show based on prior behavior
– Load the filter into a common storage (disk usually)
– Ad servers load all the filters into memory and query for your cookie id to see what to show you
![Page 38: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/38.jpg)
Examples of Redis-backed BloomFilters Has the user be here this month? If not show them a
Message. False positive doesn't matter
White vs. Black list for IP
– Known bad IP in the filter
– Upon login check the filter. Not found, login. Found – check DB to validate bad IP.
– False Positive will lead to query that returns false, but should be rare
• Ad Networks (real time BF updates based on what you searched on)
![Page 39: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/39.jpg)
Client side Joins Most NoSQL don't support joins
Architecture may have data across multiple stores
Keep a Population Bloom Filter by day of unique users in a data source
When needing to join, load smallest data source as the driver and query other sources in order of size
If queries are time based and filters are available for the time, looking up key matches can be very fast
![Page 40: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/40.jpg)
Agenda Fail Fast
What it is Redis-backed BloomFilters Examples
![Page 41: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/41.jpg)
Fail Fast The ability to quickly know to NOT do something
expensive
Example: Black-list of IPs
Think about ways to NOT do some work
Cost of Redis servers is much less than an RDBMS license or the cost of a good DB server with storage!
![Page 42: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/42.jpg)
Hammer Time
![Page 43: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/43.jpg)
Be careful Sometimes the cost of building and maintaining the
structures outweighs the benefit
Convoluted designs to avoid the database
Collect Metrics on 'hits' to see if they are any benefit (CodaHale)
![Page 44: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/44.jpg)
Example (naive) Build a BF for ads shown to a user (hash on user id
and ad id)
When the user visits, hash their user id and the top ad to display this hour and set the bits in the BF
If any were not set, the Population count is incremented and you display the ad
If already set, move to the next most important ad.
Now know total unique views by ad by hour
Can do total gross with a Redis Hash too!
![Page 45: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/45.jpg)
Example – smarter Hash the top 10 ad ids to the user id and parallel
request (Pipeline)
Check the return to see which ones aren't set, submit an update request and set the population
2 round trips to check 10 ads.
(Can also do this in LUA in 1 round trip)
![Page 46: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/46.jpg)
Example – part 2 Same idea as before, but build the bloom filter for
each hour
When user visits, query last 6 filters in parallel (pipeline!) to see if they've seen the ad(s).
Redis TTL on the hourly filter will drop it automatically when it becomes too old
![Page 47: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/47.jpg)
Example 3 Collect lots of data about users (such as virtual cows,
farm land, chickens etc.)
Run a predictive model on the data and identify which special offers to show the user visits again. Store user ids in a Bloom Filter
Load the BF into Redis
Query each time the user logs in and display appropriate offer
No massive database insert/updates to flag who should see it
False positive isn't too bad
![Page 48: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/48.jpg)
Example 4 – Query optimization Client-side joins
Ask the Bloom Filter if the user has performed the action (filters for hour, day, week of year etc.)
If not, don't even call the data source
May need to read some extra data due to 'in the last 11 days' but asking the BF and being told 'no' prevents ANY data source resources to be used
What if the BF is lost? Rebuild it from the base events (Hadoop!)
![Page 49: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/49.jpg)
Conclusion Redis is a very fast, very simple and very powerful
name value store “Data structure server”
Bloom Filters have lots of applications when you want to quickly look up if one of millions of 'things' happened
Redis-backed BloomFilters make updatable bloom filters trivial to use
Think about what you need to know to NOT do an expensive operation
Fail fast
![Page 50: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/50.jpg)
References Redis.io
http://en.wikipedia.org/wiki/Bloom_filter
http://hur.st/bloomfilter?n=4&p=1.0E-20
https://github.com/Baqend/Orestes-Bloomfilter
http://www.slideshare.net/chriscurtin
@ChrisCurtin on twitter
Github.com/chriscurtin
![Page 51: Redis and Bloom Filters - Atlanta Java Users Group 9/2014](https://reader034.vdocument.in/reader034/viewer/2022042606/547e8075b47959c0508b4b85/html5/thumbnails/51.jpg)
Questions?