bitdeli - a platform for creating custom analytics in your browser (pydata sv 2013)
DESCRIPTION
Video can be found here: https://vimeo.com/63298686TRANSCRIPT
Create Custom Analytics in Your Browser
PyData 2013
Ville TuulosCEO, Co-Founder
Everybody (Click & Play)
Business Analysts (Excel)
IT / DBAs (SQL, Python)
Data Hackers (MapReduce)
People who implement theirown infrastructure
Everybody (Click & Play)
Business Analysts (Excel)
IT / DBAs (SQL, Python)
Data Hackers (MapReduce)
People who implement theirown infrastructure
Disco
Everybody (Click & Play)
Business Analysts (Excel)
IT / DBAs (SQL, Python)
Data Hackers (MapReduce)
People who implement theirown infrastructure
Python is great
Python is greatMapReduce is hard
Python is greatMapReduce is hard
Servers are annoying (cloud or not)
Python is greatMapReduce is hard
Servers are annoying (cloud or not)Everybody likes real-time
Python is greatMapReduce is hard
Servers are annoying (cloud or not)Everybody likes real-time
Support healthy workflows
Demo
what makes some users very active?
Customer CCustomer B
how to reduce churn?
Customer A
why some users return?
Daily ActivityDaily Activity Daily Activity
Use
rs
Use
rs
Use
rs
Simple Complex
Discover
Explore
Simple Complex
Discover
Explore
Infographics
Basic Statistics
Reports
Simple Complex
Discover
Explore
Infographics
Basic Statistics
Reports
Segments
Funnels
Visualizations
Simple Complex
Discover
Explore
Infographics
Basic Statistics
Reports
Query
Segments
Funnels
Slice & Dice
Descriptive Models
Visualizations
Simple Complex
Discover
Explore
Infographics
Basic Statistics
Reports
Query
Segments
Funnels
Clustering
Slice & Dice
Descriptive Models
Visualizations
Predictive Models
DiscoDBpersistent, immutable, compressed, lightning fast,
key-value(s) mappingthat supports lazy boolean queries.
Codehttps://github.com/discoproject/discodb
Docshttp://discoproject.org/doc/discodb/
from discodb import DiscoDB
FILES = [‘a.txt’, ‘b.txt’, ‘c.txt’]
def extract_words():for fname in FILES:
for word in open(fname).read().split():yield word, fname
db = DiscoDB(extract_words())
db[‘dog’]db.keys()db.unique_values()db.items()
# files that mention ‘dog’# all distinct word# all distinct filenames# all (word, iter(fname)) pairs
Hash Map:hash(Key) → Key ID
Value Map:Key ID → [Value ID, ...]
Keys:Key ID → Key
Values:Value ID → Value
DiscoDB Chunk
Hash Map:hash(Key) → Key ID
Value Map:Key ID → [Value ID, ...]
Keys:Key ID → Key
Values:Value ID → Value
DiscoDB Chunk
Perfect hashing by CMPH,guaranteed O(1)
The list of Value IDsis delta-encoded
Values are compressed with a global Huffmancodebook
DiscoDB Chunk
Node 1 Node 2 Node N
Disco Node
Python Worker
DDFS
Disco Node
Python Worker
Disco Node
Python Worker
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
DiscoDB Chunk
A → [Apple, Orange, Banana]B → [Apple, Banana]C → [Banana, Melon]
Q(“A & B”)Apple
Banana
Q(“A | B”)Apple
OrangeBanana
Q(“(A & B) | C”)Banana
DiscoDB
from discodb.query import QQuerying with Conjunctive Normal Form
Model:Event → Users
Query (sequence of events):Q(“Event A & Event B & ...”)
Funnelhttps://github.com/tuulos/bd3-mixpanel-funnel
Model:Day N → Users
Query (weekly cohorts):
Q(“(dayN | dayN+1) & (dayM | dayM+1...)”)
Cohort Analysishttps://github.com/tuulos/bd3-mixpanel-cohort
Model:Day N → Users
Query (one time series):
[Q(Day K) for K in range(start, end)]
Time Serieshttps://github.com/tuulos/bd3-mixpanel-trends
Thank You!
TRENDINGTRENDINGTRENDINGTRENDING
https://bitdeli.com/free
Interested?Contact [email protected]
Free analytics for your GitHub repos: