case study: realtime analytics with druid
TRANSCRIPT
![Page 1: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/1.jpg)
Case Study: Real-time Analytics With Druid
Salil Kalia, Tech Lead, TO THE NEW Digital
![Page 2: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/2.jpg)
About Presenter• Over 10 years in software industry
• Working with TO THE NEW Digital since 2009
• Using mainly Java/Groovy/Grails eco-systems for the development purpose
• Working on Digital marketing domain for the last few years
• Cassandra certified trainer
• Loves traveling and exploring new places
![Page 3: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/3.jpg)
AgendaUnderstanding the use-case
• Ad workflow• Our use case
Experiments with technologies• Redis• Cassandra
Introduction to Druid• Architecture• Druid in production• Demo
![Page 4: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/4.jpg)
Understanding the use-case
![Page 5: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/5.jpg)
Understanding The Ad Workflow
AD AGENCY-2
AD AGENCY-3
AD AGENCY-1
USER
Web PageRequest
AdRequest
Ad-Content
PUBLISHERSERVER
ADEXCHANGE
![Page 6: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/6.jpg)
Examples From Our Use Case•How many times a video has been viewed ?
•How many times a video has been viewed in a particular time-span ?
•How many times a video has been viewed in a particular time-span at a particular site ?
•How many times a video has been viewed in a particular time-span at a particular site in a particular country ?
•How many times a video has been viewed in a particular time-span at a particular site in a particular country on a particular device ?
![Page 7: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/7.jpg)
Video Events For The Analysis• LOAD
• START
• PLAYING
• VIEW
• STOP / PAUSE
• FINISH
![Page 8: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/8.jpg)
Event Data (Sample)
TIMESTAMP Ad Site Advertiser Event Action
2011-01-01T01:01:27 Z 123 abc.com Brand X Player Load
2011-01-01T01:01:33 Z 234 abcd.com Brand Y Player Load
2011-01-01T01:01:40 Z 123 abc.com Brand X Player Start
2011-01-01T01:01:45 Z 123 abc.com Brand X Player Playing
2011-01-01T01:01:50 Z 123 abc.com Brand Y Player Playing
2011-01-01T01:01:51 Z 123 abc.com Brand X Player Stop
![Page 9: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/9.jpg)
What Is Analytics ?Processing the HISTORICAL data to:
•Understand potential trends
•Analyze the effects of certain decisions or events
•Evaluate the performance of a system
•Make better business decisions
![Page 10: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/10.jpg)
What Is Real-time Analytics ?
![Page 11: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/11.jpg)
Why (We Need) Real-time Analytics ?
• Understand the real-time performance
• Control the velocity
• Avoid over serving
• Avoid under serving
• Control the targeting
![Page 12: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/12.jpg)
Recap – Things We Understood
• How the ad-tech works (in general)
• Our use-case
• Different video player events
• We are expecting a huge amount of data coming at a very high velocity.
![Page 13: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/13.jpg)
Experiments with technologies
![Page 14: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/14.jpg)
Why We Picked Redis
• Great buzz in the market
• Highly scalable
• Easy to setup, configure and use
• We were not very clear with our use-case
![Page 15: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/15.jpg)
Realizations From Redis
• Not a good fit to deal with time-series (big) data
• Persistence is another issue – we can’t afford loosing data
• There was a huge variety of keys all over the place
• Complexity in the (application side) code started increasing
![Page 16: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/16.jpg)
Working With Cassandra
• Very good support for the time-series data
• Extremely good for writing the data at a very high speed
• Very easy to scale horizontally
• Supports aggregations through Counters
![Page 17: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/17.jpg)
Writing into Cassandra
ANALYTICSSERVER
CASSANDRA
AD PLAYER
![Page 18: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/18.jpg)
Reading from Cassandra
ANALYTICSSERVER CASSANDRA
CAMPAIGNMANAGER
![Page 19: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/19.jpg)
What didn’t work with Cassandra
• Inconsistent results
• Unreliable counters
• No ad-hoc queries support
• Nodes were crashing out very frequently
![Page 20: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/20.jpg)
Crossroads – What next ?
• Third party tools on the top of Cassandra for better consistency
• DataStax Enterprise edition
• Taking a deeper dive into Cassandra to reconfigure the whole architecture and setup
• Switching to different technology
![Page 21: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/21.jpg)
Understanding druid
![Page 22: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/22.jpg)
About Druid (http://druid.io)
• An open-source analytics data store
• Supports streaming - data ingestion
• Flexible filters for ad-hoc queries
• Fast aggregations – sub second queries
• Distributed, shared-nothing architecture
• Easily scalable
![Page 23: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/23.jpg)
Setting Up Druid In Production
KAFKA(CLUSTER)
ANALYTICSSERVER
DRUIDCLUSTER
CASSANDRA
AD PLAYER
![Page 24: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/24.jpg)
Druid’s Reliability Check
KAFKA(CLUSTER)
ANALYTICSSERVER
DRUIDCLUSTER
RAW FILECONSUMER
RAWFILES
RAWFILES
RAWFILES
Job To Test Druid’s
Integrity
AD PLAYER
![Page 25: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/25.jpg)
A Quick Demo
![Page 26: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/26.jpg)
![Page 27: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/27.jpg)
Druid Architecture
DEEPSTORAGE
ZOOKEEPER
Druid Nodes
External Dependencies
Queries
MetaData
Data/Segments
Client Queries
StreamingData
REALTIME
NODES
COORDINATORNODES
HISTORICALNODES
BROKERNODES
MY SQL
![Page 28: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/28.jpg)
Druid Data Ingestion
DEEPSTORAGE
ZOOKEEPER
Druid Nodes
External Dependencies
Queries
MetaData
Data/Segments
Client Queries
StreamingData
REALTIME
NODES
COORDINATORNODES
HISTORICALNODES
BROKERNODES
MY SQL
![Page 29: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/29.jpg)
Druid Data Ingestion (Our System)
KAFKA(CLUSTER)
DRUIDReal-time NodeANALYTICS
SERVERAD PLAYER
![Page 30: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/30.jpg)
Druid Data Retrieval
DEEPSTORAGE
ZOOKEEPER
Druid Nodes
External Dependencies
Queries
MetaData
Data/Segments
Client Queries
StreamingData
REALTIME
NODES
COORDINATORNODES
HISTORICALNODES
BROKERNODES
MY SQL
![Page 31: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/31.jpg)
Coordinator Nodes
DEEPSTORAGE
ZOOKEEPER
Druid Nodes
External Dependencies
Queries
MetaData
Data/Segments
Client Queries
StreamingData
REALTIME
NODES
COORDINATORNODES
HISTORICALNODES
BROKERNODES
MY SQL
![Page 32: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/32.jpg)
Druid Data Segment Propagation
DEEPSTORAGE
ZOOKEEPER
Druid Nodes
External Dependencies
Queries
MetaData
Data/Segments
StreamingData
REALTIME
NODES
COORDINATORNODES
HISTORICALNODES
MY SQL
![Page 33: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/33.jpg)
Our Production Stats
•Over 200 million events per day – ingested into Druid cluster
•4 boxes with 8 cores, 64GB RAM, 1TB SSD
•2 coordinator nodes (only one master)
•2 real-time nodes
•4 historical nodes (on each box)
![Page 34: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/34.jpg)
Companies Using Druid
![Page 35: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/35.jpg)
Questions ?
![Page 36: Case Study: Realtime Analytics with Druid](https://reader035.vdocument.in/reader035/viewer/2022081507/587069691a28ab48378b5c27/html5/thumbnails/36.jpg)