one billion rows per second: analytics for the digital media markets
DESCRIPTION
One Billion Rows Per Second: Analytics for the Digital Media Markets. XLDB October 19, 2011. MICHAEL DRISCOLL CO-FOUNDER & CTO. @ medriscoll. Taming the Inferno of the Online Ad Markets. billions of microtransactions per day dozens of publisher, advertiser, & audience attributes. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/1.jpg)
One Billion Rows Per Second:Analytics for the Digital Media Markets
XLDBOctober 19, 2011
MICHAEL DRISCOLLCO-FOUNDER & CTO
@medriscoll
![Page 2: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/2.jpg)
Taming the Inferno of the Online Ad Markets
• billions of microtransactions per day• dozens of publisher, advertiser, & audience attributes
![Page 3: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/3.jpg)
Goal: Fast AnalyticsOver 100s of Terabytes
![Page 4: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/4.jpg)
datacrunched in
minutes
queries inseconds
dashboard
database
ingestion
Goal: Fast AnalyticsOver 100s of Terabytes
![Page 5: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/5.jpg)
datacrunched in
minutes
queries inminutes
dashboard
database
ingestion
Solution 1: MPPDatabase
MPP Database
Hadoop
![Page 6: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/6.jpg)
datacrunchedin hours
queriesin seconds
dashboard
database
ingestion
Solution 2: HBase
HBase
Hadoop
![Page 7: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/7.jpg)
datacrunched
in minutes
queriesin seconds
dashboard
database
ingestion
Solution 3: Do It Ourselves: Druid
Druid
Hadoop
![Page 8: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/8.jpg)
Four Principles of Druid’s Performance at Scale
SUMMARIZE
DISTRIBUTE
PARALLELIZE
STORE IN-MEMORY
100x smaller vs raw data
100x throughput vs a single node (with 100 cores)
100x faster vs disk
= 10^6Druid can filter and aggregate over 1 billion rows per second on a 50-core cluster, or 20m rows per core per second factor speed-up
![Page 9: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/9.jpg)
Consequences of Druid: Faster Queries
photo credit tonylanciabeta http://www.flickr.com/photos/tonysphotos/3305157904/sizes/o/in/photostream/
![Page 10: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/10.jpg)
Consequences of Druid: Fresher Data
photo credit: Lars P. http://www.flickr.com/photos/lars_p/4911238308/sizes/o/in/photostream/
![Page 11: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/11.jpg)
Consequences of Druid: Scalable in the Cloud
photo credit: MonkeyAt Large http://www.flickr.com/photos/monkeyatlarge/16645379/sizes/l/in/photostream/
![Page 12: One Billion Rows Per Second: Analytics for the Digital Media Markets](https://reader034.vdocument.in/reader034/viewer/2022051420/5681601c550346895dcf1b20/html5/thumbnails/12.jpg)
One Billion Rows Per Second:Analytics for the Digital Media Markets
QUESTIONS? CONTACT ME AT [email protected]
MICHAEL DRISCOLLCO-FOUNDER & CTO
@medriscoll