acunu analytics @ cassandra london
DESCRIPTION
My talk about Acunu Analytics - for video see http://skillsmatter.com/podcast/nosql/acunu-analyticsTRANSCRIPT
Acunu AnalyticsRealtime Big Data Analytics
Tom Wilkie, Acunu16th July 2012
Analytics
• Motivation / alternatives
• What is it?
• How does it work?
• Whats it good for?
2
Analytics
• Motivation / alternatives
• What is it?
• How does it work?
• Whats it good for?
3
Analytics
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
time page session id duration
... ... ... ...
14:58:03.234 /index.html 248.180.3.40 175
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
14:58:03.409 /csi/csi/council/freedom.html 248.180.3.40 1234
14:58:03.877 /docs/access/chapter8.txt 99.1.10.178 52
4
Analytics
Live & historicalaggregates... Trends... Drill downs
and roll ups
Combining “big” and “real-time” is hard
5
Analytics6
Solution Con
Scalability$$$
Not realtimeInefficient Recomputation
Spartan query semantics => complex, DIY solutions
Analytics
• Motivation / alternatives
• What is it?
• How does it work?
• Whats it good for?
7
Analytics
• Simple, real-time, incremental analytics
• Push processing into ingest phase
events
counterupdates
Acunu Analytics
Click streamSensor data
etc
Analytics
{time : TIME(HOUR; MIN; SEC),page : PATH(/),category : STRING,loadTime : LONG
}
{select : ["COUNT", "AVG(loadTime)"],where : “time, ?path”,group : “time, ?category”
}
9
Analytics
• Motivation / alternatives
• What is it?
• How does it work?
• Whats it good for?
10
Analytics
Introduction
11
Analytics
countgrouped by ...
daycount
distinct (session)
count ... geography
... browseravg(duration)
12
Analytics
time : TIME(HOUR; MIN; SEC),cust_id : LONG,session_id : LONG,geography : STRING,browser : STRING,load_time : LONG
Data Definition
{ select: “COUNT” patterns: [ { where : “?time”, group : “?time” }, { where : “”, group : “geography” }, { where : “”, group : “browser” } ]}, { select: [“COUNT_DISTINCT(session_id)”, “AVG(load_time)”], where: “time”, group: “”}
QueryPatterns
13
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3221 :00→22 :00→19 :02→104 ...
... ...
UK all→228 user01→1 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1904 ...
∅ all→87314 UK→238 US→354 ...
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
14
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :00→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
15
{cust_id: user01,session_id: 102,geography: UK,browser: IE,time: 22:02,
}
Analytics
21:00 all→1345 :00→45 :01→62 :02→87 ...
22:00 all→3222 :00→22 :01→19 :02→105 ...
... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
16
where time 21:00-22:00count(*)
where time 22:00-23:00, group by minute
where geography=UK group all by user,
count all
group all by geo
Analytics
• SUM, COUNT, MIN, MAX, STDDEV, AVG, TOP k, COUNT DISTINCT
• Also: approx top k, approx count distinct
• Also: idempotent update
• RESTful JSON interface, CLI
17
Analytics
• Motivation / alternatives
• What is it?
• How does it work?
• Whats it good for?
18
Analytics
Manufacturing
Systems Monitoring
Financial Services
Social Media Ad Analytics
Oil + Gas
Analytics
“Up and running in about 4 hours”
“We found out a competitor was scraping our data”
“We keep discovering use cases we hadn’t thought of ”