gregorry letribot - druid at criteo - nosql matters 2015
TRANSCRIPT
Once upon a time, in an SQL Galaxy..
« Guys, database whatever_DB contains 3B rows
Disks are full, it needs a purge
Server will be reinstalled and will host only the last 30 days »
« Well, talk with product »
Suming
up Food for BI
Interactive, sub-second insights
Arbitrarily drill into data
Scalability, availability…
Columnar
StoreOnly reads relevant data
Iphone Google Computer 0.1€08:12:37
Android Yahoo Cloth 0.2€08:12:38
Select sum(cost) whereDevice = Iphone
High
compressi
onDictionary encoding & LZF
Wacken
Hellfest
Fall of Summer
Wacken
Hellfest
1
2
3
1
2
Metadata:
Wacken =>
1
Hellfest=>
2
Fall of
summer=> 3
Inverted
indexWacken 1,0,0,0,1
Hellfest 0,1,0,1,0
Fall of Summer 0,0,1,0,0
Wacken
Hellfest
Fall of Summer
Wacken
Hellfest
Fast binary operations
Sketching
algorithmsHyperLogLog Approximate unique count
Extreme storage reduction
Constant time computation
Performances No downtime in 6 months Aggregate displays, clicks, sales & revenue generated for our biggest advertiser
grouped by device
over 7 months = 197 ms
Performances
According to
metamarkets 33M rows per second per core
Scaled up to 26B rows per second
10k event per second ingestion per node
What’s wrong ? Be carefull with your data model
Immutable is.. Immutable !
No joins, no full sql capabilities
A couple of bugs.. But very active and
friendly team !