visualizing a billion points w/ bokeh datashader

23
© 2016 Continuum Analytics- Confidential & Proprietary Visualizing a Billion Points with Bokeh Datashader Peter Wang Continuum Analytics CTO & Co-Founder @pwang

Upload: continuum-analytics

Post on 21-Apr-2017

3.770 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Visualizing a Billion Pointswith Bokeh Datashader

Peter Wang Continuum Analytics CTO & Co-Founder

@pwang

Page 2: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Double Feature!

Bokeh: Interactive web visualization library for Python (“d3 for Python”, “Shiny for Python”)http://bokeh.pydata.org

Datashader: Library statistically-driven visualization of extremely large datasets http://github.com/bokeh/datashader

2

Page 3: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Bokeh

• Interactive visualization • Novel graphics • Streaming, dynamic, large data • For the browser, with or without a server • No need to write Javascript

3

http://bokeh.pydata.org

Page 4: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Versatile Plotting Capabilities

4

Page 5: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Linked plots, tools

5

• Easy to show multiple plots and link them • Easy to link data selections between plots • Can easily customize the kind of linkage straight from

Python, without needing to fiddle around with JS

Page 6: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Large data

• With easy WebGL support, can scale to 500k points or so

• Bottlenecks are browser performance, JSON encoding, network transport

6

Page 7: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

rBokehPlays well with R ecosystem: HTMLwidget, RMarkdown…

7

http://hafen.github.io/rbokeh

Page 8: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

rBokeh with RStudio & ShinyPlays well with R ecosystem: HTMLwidget, RMarkdown…

8

Page 9: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Bokeh Apps: Shiny for Python

• Fully interactive data web apps • Streaming data, dynamic data • Easy-to-write pure Python charts, widgets, event

handlers • Open source (BSD licensed), including server • Enterprise on-prem version in Anaconda Enterprise,

with Active Directory/LDAP auth

9

Page 10: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Example Apps

10

Page 11: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Easy Streaming Apps

In this demo, we will demonstrate how the Bokeh server makes it easy to visualize streaming and dynamic data.

11

• A minimal example with < 50 LOC • Demonstrates ease of pushing data

from Python code into the browser

Page 12: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary 12

Page 13: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Embeds Well

13

http://cecp.mit.edu

Page 14: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

For more information on Bokeh Apps

• Webinar: http://www.slideshare.net/continuumio/hassle-free-data-science-apps-with-bokeh-webinar

• PyData Videos, Tutorials

14

Page 15: Visualizing a Billion Points w/ Bokeh Datashader

Community & AdoptionGithub • 4100+ stars • 860+ forks

Mailing list • 400+ members • 150+ posts in November

Downloads • 45,000 / month (conda) • 4,000 / month (pip)

Page 16: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Billions and billions…

16

Page 17: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Data Shading Main Points

17

• When trying to visualize millions of points, browser vs. rich client doesn’t really matter

• Raft of common problems that are ignored: Overdraw, over- & under-saturation, clipping, coarse binning

• Statistical transformations of data are a first-class aspect of the visualization

• Rapid iteration of visual styles & configs, interactive selections and filtering are key concerns in data exploration

When data is large, you don’t know when the viz is lying.

Page 18: Visualizing a Billion Points w/ Bokeh Datashader

18

Data Shading Pipeline

Data

Project / Synthesize

Scene Aggregates

Sample / Raster Transfer

Image

Visual Abstraction

DataTransforms

VisualMappings

ViewTransforms

Data Tables

Source Data Views

Selection Aggregation Transfer

SignificantSet Aggregates

Page 19: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Dataset 1: OverviewThis demo shows how traditional plotting tools break down for large datasets, and how to use datashading to make even large datasets practical interactively.

19

• Data for 10 million New York City taxi trips

• Even 100,000 points gets slow for scatterplot

• Parameters usually need adjusting for every zoom

• True relationships within data not visible in std plot

Datashading automatically reveals the entire dataset, including outliers, hot spots, and missing data

Page 20: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Categorical data: 2010 US Census

20

• One point per person

• 300 million total • Categorized by

race • Datashading

shows faithful distribution per pixel

Page 21: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

OSM Dataset: 3 Billion PointsBecause Datashader decouples the data-processing from the visualization, it can handle arbitrarily large data

21

• About 3 billion GPS coordinates

• https://blog.openstreetmap.org/2012/04/01/bulk-gps-point-data/.

• This image was rendered in one minute on a standard MacBook with 16 GB RAM

• Renders in 7 seconds on a 128GB Amazon EC2 instance

Page 22: Visualizing a Billion Points w/ Bokeh Datashader

© 2016 Continuum Analytics- Confidential & Proprietary

Contact Information and Additional Details• Contact [email protected] for more information about

Anaconda subscriptions and about becoming an early adopter for Data Explorer — help make sure our product fits your needs!

• View documentation and examples at

github.com/bokeh/datashader and bokeh.pydata.org

• View demo notebooks on Anaconda Cloud

notebooks.anaconda.org/jbednar/

22

Page 23: Visualizing a Billion Points w/ Bokeh Datashader

Thank you

Email: [email protected]

Twitter: @ContinuumIO

Peter WangTwitter: @pwang

Bokeh

Twitter: @bokehplots