bitdeli - a platform for creating custom analytics in your browser (pydata sv 2013)

Post on 10-May-2015

525 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Video can be found here: https://vimeo.com/63298686

TRANSCRIPT

Create Custom Analytics in Your Browser

PyData 2013

Ville TuulosCEO, Co-Founder

Everybody (Click & Play)

Business Analysts (Excel)

IT / DBAs (SQL, Python)

Data Hackers (MapReduce)

People who implement theirown infrastructure

Everybody (Click & Play)

Business Analysts (Excel)

IT / DBAs (SQL, Python)

Data Hackers (MapReduce)

People who implement theirown infrastructure

Disco

Everybody (Click & Play)

Business Analysts (Excel)

IT / DBAs (SQL, Python)

Data Hackers (MapReduce)

People who implement theirown infrastructure

Python is great

Python is greatMapReduce is hard

Python is greatMapReduce is hard

Servers are annoying (cloud or not)

Python is greatMapReduce is hard

Servers are annoying (cloud or not)Everybody likes real-time

Python is greatMapReduce is hard

Servers are annoying (cloud or not)Everybody likes real-time

Support healthy workflows

Demo

what makes some users very active?

Customer CCustomer B

how to reduce churn?

Customer A

why some users return?

Daily ActivityDaily Activity Daily Activity

Use

rs

Use

rs

Use

rs

Simple Complex

Discover

Explore

Simple Complex

Discover

Explore

Infographics

Basic Statistics

Reports

Simple Complex

Discover

Explore

Infographics

Basic Statistics

Reports

Segments

Funnels

Visualizations

Simple Complex

Discover

Explore

Infographics

Basic Statistics

Reports

Query

Segments

Funnels

Slice & Dice

Descriptive Models

Visualizations

Simple Complex

Discover

Explore

Infographics

Basic Statistics

Reports

Query

Segments

Funnels

Clustering

Slice & Dice

Descriptive Models

Visualizations

Predictive Models

DiscoDBpersistent, immutable, compressed, lightning fast,

key-value(s) mappingthat supports lazy boolean queries.

Codehttps://github.com/discoproject/discodb

Docshttp://discoproject.org/doc/discodb/

from discodb import DiscoDB

FILES = [‘a.txt’, ‘b.txt’, ‘c.txt’]

def extract_words():for fname in FILES:

for word in open(fname).read().split():yield word, fname

db = DiscoDB(extract_words())

db[‘dog’]db.keys()db.unique_values()db.items()

# files that mention ‘dog’# all distinct word# all distinct filenames# all (word, iter(fname)) pairs

Hash Map:hash(Key) → Key ID

Value Map:Key ID → [Value ID, ...]

Keys:Key ID → Key

Values:Value ID → Value

DiscoDB Chunk

Hash Map:hash(Key) → Key ID

Value Map:Key ID → [Value ID, ...]

Keys:Key ID → Key

Values:Value ID → Value

DiscoDB Chunk

Perfect hashing by CMPH,guaranteed O(1)

The list of Value IDsis delta-encoded

Values are compressed with a global Huffmancodebook

DiscoDB Chunk

Node 1 Node 2 Node N

Disco Node

Python Worker

DDFS

Disco Node

Python Worker

Disco Node

Python Worker

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

DiscoDB Chunk

A → [Apple, Orange, Banana]B → [Apple, Banana]C → [Banana, Melon]

Q(“A & B”)Apple

Banana

Q(“A | B”)Apple

OrangeBanana

Q(“(A & B) | C”)Banana

DiscoDB

from discodb.query import QQuerying with Conjunctive Normal Form

Model:Event → Users

Query (sequence of events):Q(“Event A & Event B & ...”)

Funnelhttps://github.com/tuulos/bd3-mixpanel-funnel

Model:Day N → Users

Query (weekly cohorts):

Q(“(dayN | dayN+1) & (dayM | dayM+1...)”)

Cohort Analysishttps://github.com/tuulos/bd3-mixpanel-cohort

Model:Day N → Users

Query (one time series):

[Q(Day K) for K in range(start, end)]

Time Serieshttps://github.com/tuulos/bd3-mixpanel-trends

Thank You!

TRENDINGTRENDINGTRENDINGTRENDING

https://bitdeli.com/free

Interested?Contact ville@bitdeli.com

Free analytics for your GitHub repos:

top related