facebook style notifications using hbase and event streams
DESCRIPTION
This talk is about building a low-latency, near real-time Notifications platform for serving user intent using Event based architecture, Complex Event Processing and a data store like HBase. Will also cover how millisecond response times are achieved when accessing data from 100 million rows by interpreting change from immutable events and organizing data as LSM trees.TRANSCRIPT
Facebook Style Notifications Using HBase and Event Streams
github.com/regunathb RegunathB
Serving User Intent (eCommerce)
• Mass targeted(Low relevance)– User Intent Captured
from: Browse, Buy, Register
• Quantified,Time-bound(Improved relevance)– User Intent Derived from:
Category Affinity, Recommendations
Serving User Intent (social)
Image Source : http://allfacebook.com/
• Near real-time– Quick updates about
friends’ actions that most affect you
• Relevant Actions– Likes, Comments etc
• Personalized– Content only from social
circle
• Non-invasive– Users therefore tolerate less
relevant content as compared to email
Notifications on Flipkart
Search, Browse
Add to Wish List
Add to Cart
Checkout/Buy
User Intent derived from
Price Drop Notification
iPhone 5C
Price =42K
Price =44K
Price =39K
.
.Time.
t2
t0
t1
t2
Solution 1 : Generate Notifications on Demand
Gather User
Intent
Retrieve Current,
PastData
IntentsData store
• Pros• Perceived optimal resource utilization
• Cons• Gathering, Processing and Serving coupled• Read path is computationally expensive• High latency• Need versioning support on Product data• Repeated computations Product
Data store
Create Notifications on Visits
Solution 2 : Pre-create in Real-time, Serve on Demand
What Leads to a Notification?
Intent (interest expressed by the user) Event (price changes ) => Notification⋂
(Intersection of millions of User Intent and Product Changes)
Intent Event Stream
Change Event StreamNotifications
Intent Capturing
System
Event Processing
System
Notification Serving System
Intents,Notifications
Product changes
append
createupdateexpire
Event based Pre-processing Near real-time Serving
read
Pre-create, Serve on Demand
SEDA, Filtering using CEP
Filtered event
processing
Intents
Product changes
Facts,Notifications
CEP Engine
intermediate stages
intermediate stages
Extractunique interests
The Data Store• Store large sets of data
– Products(P) 10s M
– Users(U) 10s M
– Activity(I = U X P) 100s M
– Events/day (E = P + U) 10s M
– Notifications (N = E I) ⋂ >100 M (in total)
• High write throughput• High read throughput for sets of data
– Intents: user pivoted, Facts: product pivoted
• Low latency reads– Notifications – user pivoted, ordered by recency
The Data Store - HBase
U:USERID_A:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDSGU2ZMDYENQ
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
U:USERID_B:TIMESTAMP:PRICE_DROP:MOBDP6W6MCUWCF
U:USERID_C:TIMESTAMP:PRICE_DROP:MOBDQ9VXXXX6NF8V
LSM Tree
Row key design for Notifications table
Image Sources : http://blog.sematext.com/, http://dailyjs.com/
• Benefits of keeping related data together– Minimize disk seek for rows
read– Rows may be returned from
Block cache, MemStore
Intent Capturing System
Event Processing System
Notification Serving System
HBase(Intents, Notifications)
Product changes
append
createupdateexpire
Event based Pre-processing Near real-time Serving
read
Tech Stack
TrooperBatch
W3 viaPhantom
Trooper SEDA (RabbitMQ, Mule), CEP (Esper)
Phantom Flipcast
CeryxTomcat
CDNMemcached
Tech Stack• Phantom – Reverse proxy for latency sensitive user actions
• Trooper Batch – Cron jobs
• Trooper SEDA – Distributed, Event processing
• FlipCast – Platform agnostic multi-cast notifications
• RabbitMQ – Integration, Work distribution
• Esper – Complex Event Processing (Filtering/Matching)
• HBase – Data store
• Tomcat – REST services container for Notifications • Ceryx – Target Group generation, User preferences
Flipkart OSS Public domain OSS Closed source
Operating Notifications
A/B framework
Phantom: Intent Capture
Phantom: Serve Notifications
Trooper Batch : Jobs
• Monitoring consoles– RabbitMQ queues– FQ service – Graphite– Nagios – Omniture tracking– Trooper SEDA & Batch
consoles
Tweeple Reactions
Recap• Pros
– Low latency read-path, resilience to failure (ok to not show notifications for some users)
– Scales well (LSM trees, KV store, SEDA, CDN for images)– Immutable Facts, Change Events stored in append-only data store
provides ability to re-compute notifications
• Cons– Consistency challenges
• HBase has strong consistency (single write master) but Notification source data can change – leading to Eventual Consistency
– Pre-creating Notifications that may never be seen (cost of storage)
References
• HBase : The Definitive Guide (http://www.flipkart.com/hbase-definitive-guide/p/itmd36cuhzdfq4za?pid=DGBDTYAYB3PNSGYN )
• Block cache 101(http://hortonworks.com/blog/hbase-blockcache-101/) • Trooper (https://github.com/regunathb/Trooper)
• Flipkart Phantom (https://github.com/Flipkart/phantom)
• Facebook messages & Hbase (http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase)