agile data visualisation

27
Agile Data Visualisation Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness

Upload: volodymyr-kazantsev

Post on 07-Aug-2015

66 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Agile Data Visualisation

Volodymyr (Vlad) KazantsevHead of Data Science at Product Madness

volodymyrk

About myself

MS Math, Probability TheoryKiev, 1999-2004

Graphics Programming,Video Games

Kiev, 2002-2005

Visual Effect Programming

Berlin, Sydney, London2005-2010

MBALondon Business

School2010-2012

Product Manager(King, Splash Damage)

2012-2013

Head of Data Science2013-present

volodymyrk

Product Madness

● Social Casino Games - not gambling● 60 people in London, 30 in San Fran, 25 in Minsk

volodymyrk

Product Madness in RankingsiPad rankings, US iPad rankings, Australia

volodymyrk

Data Science at Product Madness

● Team of 6● Analyse product releases, A/B tests, etc.● Audit Marketing activities● Dev/support of DWH (AWS Redshift)● analysis: ipynb, pandas, matplotlib, scipy..● products: Flask, AWS, D3.js● .. and SQL

volodymyrk

Data Visualisation at Product Madness

1. Research and ad-hoc analysis

2. Self-Service Dashboards

3. Self-service Big Data BI

volodymyrk

What is Advanced Visualisation?

- Effective

- Not limited by immediately available tools

- Impressive

volodymyrk

People still make those .. in 2015

100% Real charts taken from company’s Strategy meeting

volodymyrk

My rules for Effective Data Visualisation

1. Keep it simple2. Keep a high data-ink ratio3. Consistency is important4. Mind the Context

Effective Data Visualisation in

IPython

This does not look great by default.

(but defaults are much improved, especially with seaborn)

publish()1. formats the chart

2. create chart label (large font)

3. saves “Random Data.png” into “Images” folder with high DPI

volodymyrk

Python Visualisations for reportscompared to Matplotlib:1. no borders2. double width lines3. markers4. Cynthia Brewer colors5. borderless legend6. light-grey grid lines7. slightly darker grey on x-axis8. ticks outside, x-axis only

volodymyrk

Python Visualisations for reports

● White background for presentations ● Avoid vector formats (.svg, .swf). Use high DPI .png● Consistent style, colors and fonts make reports look professional

Web-based Dashboards

volodymyrk

Dashboards, V1

volodymyrk

Dashboards, V2 - Tableau

volodymyrk

Dashboards, V2 - The Style Guide❑ Charts should be 800px wide, the dashboard no wider than 1000px. Charts height: 200-300px

❑ Charts BG RGB: 238 243 250

❑ Dates should be formatted “d mmm” e.g. “7 Jan”. Only include the year if absolutely necessary

❑ Don’t show unnecessary precision: 0.50% is the same as 0.5%

❑ Bar charts always start their axis at 0

❑ A line graphs’ axis should start wherever makes the average slope 45º

❑ Add titles for Chart (centered, bold), axis too (if not obvious)

❑ Add “Updated at … UTC” in the bottom of the first chart in Dashboard

❑ Still looking for a perfect Date selector.. Use Default Tableau one, not minimalistic one.

❑ Filters should apply to all charts in a dashboard

❑ No scrolling anywhere on the dashboard. Browser has a scrolling bar already. Huge legends/filters are useless.

volodymyrk

❑ Charts should be 800px wide, the dashboard no wider than 1000px. Charts height: 200-300px

❑ Charts BG RGB: 238 243 250

❑ Dates should be formatted “d mmm” e.g. “7 Jan”. Only include the year if absolutely necessary

❑ Don’t show unnecessary precision: 0.50% is the same as 0.5%

❑ Bar charts always start their axis at 0

❑ A line graphs’ axis should start wherever makes the average slope 45º

❑ Add titles for Chart (centered, bold), axis too (if not obvious)

❑ Add “Updated at … UTC” in the bottom of the first chart in Dashboard

❑ Still looking for a perfect Date selector.. Use Default Tableau one, not minimalistic one.

❑ Filters should apply to all charts in a dashboard

❑ No scrolling anywhere on the dashboard. Browser has a scrolling bar already. Huge legends/filters are useless.

Dashboards, V2 - The Style Guide

No Version Control

Maintenance takes time

..and still no good Date Selector

Self-service Big Data BI

volodymyrk

BI Tools Triangle

Easy to setupfor IT & Data teams

Easy to usefor end users

Powerfulfor end users

volodymyrk

Scale

● Code naturally promote reuse-ability

● Code have version-control● You never really “develop

from scratch”

volodymyrk

Dashboards, V3 - Flask+JSFront End:

- dc.js- bootstrap.js- colorbrewer.js

Back End:- Flask- pandas- Redshift (data cubes)- S3: csv cache

volodymyrk

Tech Stack

● Redshift Back-End (ELT+Cubes)● Python, Flask, Pandas● DC.js, scrossfilter.js, D3.js

volodymyrk

Self-Serve Big Data BI

● Tableau client● Looker● ElasticSearch + Kibana● Bokeh

volodymyrk

Summary

● Good looking visualisation is better than an ugly one● Interactivity leads to more insights● Consistency matters; Code allows to style once● You never really “develop from scratch”, or “just use

off-the-shelf” tool● Mind your team capabilities and aspirations● Don’t be limited by your existing tool(s)

volodymyrk

Questions?

We are hiring