provenance for interactive visualizationsfotis/pubs/posters/iprov.pdf · provenance for interactive...

1
Background Provenance for Interactive Visualizations Fotis Psallidas Columbia University [email protected] q Data scientists need to explore large volumes of information to gain data insights as fast as possible q Interactive visualizations bring powerful ad-hoc and focused analysis to both technical and non-technical data scientists q Interactive visualizations are increasingly adopted across domains q Machine learning toolkits (e.g., Tensorflow and R Studio) q Decision support Systems (e.g., Endeca, Tableau, and PowerPivot) q Knowledge exploration (e.g., TimeMachine and Yago Explorer) q News (e.g., Interactive Stories at New York Times) Problem Statement Interactive Visualizations using Provenance Evaluation and Results E. Wu, F. Psallidas, Z. Miao, H. Zhang, L. Rettig, Y. Wu, and T. Sellam; Combining Design and Performance in a Data Visualization Management System; In Proceedings of the Conference on Innovative Database Research, 2017. The DVMS Project Eugene Wu Columbia University [email protected] Revenue Profit Price Product backward_trace view_refresh B = BACKWARD TRACE FROM SPLOT@vnow-1 AS SP, C WHERE in_rectangle(bbox(C), SP) TO Sales; SPLOT = SELECT ..., 'red' AS fill FROM B UNION SELECT ..., 'gray' AS fill FROM (Sales EXCEPT B) HIST = SELECT ..., 'red' AS fill FROM B UNION SELECT ..., 'blue’ AS fill FROM (Sales EXCEPT B) Event Recognizer Interaction Manager Executor Storage Manager Events Views Base Data Static Analysis Provenance Down Move Move Online Optimizer Interaction Log State Machine Offline Optimizer DeVIL Program Revenue Profit Price Product DVMS: Research Prototype q Specification q Completeness on several interaction taxonomies q Support for many novel techniques due to provenance availability q Interactive debugging of applications q Deconstruction and restyling of visualizations q Data explanation q Provenance visualization qOptimization q Preliminary results: 20-150% improvements on provenance materialization latency against state-of-the-art provenance subsystems q Expecting improvements on interaction latency with novel materialization options (i.e., data skipping and delta propagation) Optimizing Interactions through Provenance q Instead of going all the way back and forward, keep mappings (provenance materialization) Main problems: q Materializing and indexing provenance can be expensive qMany possible options for materialization. Which one to pick and when? Static Visualizations SPLOT = SELECT 8 AS radius, 'gray' AS stroke, 'gray' AS fill, lscale(revenue, sx) AS center_x, lscale(profit, sy) AS center_y, FROM Sales, sx, sy; HIST = SELECT 4 AS width, 'blue' AS fill, hscale(price, hx) AS height FROM Sales, hx; render(SELECT * FROM SPLOT); render(SELECT * FROM HIST); q Selection of marks: (user event stream visual marks) q Linked brushing: Backward trace selection to base data and refresh the coordinated view based on the traced subset Revenue Profit Price Product q Data from a database converted to visual marks using SQL queries (i.e., as database views) q Visual marks (encoded as views) are rendered using mark-specific render functions Specify and optimize interactive visualizations q Previous approaches: using data visualization systems or manual implementation using popular toolkits that scale with the Web q Data visualization systems: limited interactions yet easy to use q Tableau: interactions for OLAP queries q Crossfilter: linked selection interactions for correlated dimensions q Manual implementations: hard to optimize, maintain, extend, and reason about q A declarative, relational approach to interactive visualizations q Main win: push down optimization and design choices into the database system with readily available guarantees and optimizations q DVMS ecosystem: several extensions to the database box for the optimization and design of interactive visualizations q Provenance subsystem for the specification and optimization of interactive visualizations q Other extensions: q Asynchronous interactions using concurrency control ideas q Streaming for the optimization of near-interactive visualizations q Precision interfaces for recommendation of new designs q and many more… Revenue Profit Price Product

Upload: others

Post on 22-May-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Provenance for Interactive Visualizationsfotis/pubs/posters/iprov.pdf · Provenance for Interactive Visualizations Fotis Psallidas Columbia University fotis@cs.columbia.edu qData

Background

Provenance for Interactive VisualizationsFotis Psallidas

Columbia [email protected]

q Data scientists need to explore large volumes of information to gain data insights as fast as possible

q Interactive visualizations bring powerful ad-hoc and focused analysis to both technical and non-technical data scientists

q Interactive visualizations are increasingly adopted across domainsq Machine learning toolkits (e.g., Tensorflow and R Studio)q Decision support Systems (e.g., Endeca, Tableau, and PowerPivot) q Knowledge exploration (e.g., TimeMachine and Yago Explorer)q News (e.g., Interactive Stories at New York Times)

Problem Statement Interactive Visualizations using Provenance

Evaluation and Results

E. Wu, F. Psallidas, Z. Miao, H. Zhang, L. Rettig, Y. Wu, and T. Sellam;Combining Design and Performance in a Data VisualizationManagement System; In Proceedings of the Conference onInnovative Database Research, 2017.

The DVMS Project

Eugene WuColumbia University

[email protected]

Rev

enue

Profit

Pric

e

Product

backward_trace

view_refresh

B = BACKWARD TRACE FROMSPLOT@vnow-1 AS SP, CWHERE in_rectangle(bbox(C), SP)TO Sales;

SPLOT = SELECT ..., 'red' AS fill FROM B UNION SELECT ..., 'gray' AS fillFROM (Sales EXCEPT B)

HIST = SELECT ..., 'red' AS fillFROM B UNION SELECT ..., 'blue’ AS fillFROM (Sales EXCEPT B)

Event Recognizer

Interaction Manager

Executor

Storage Manager

EventsViews

Base

Data

Static Analysis

Pro

ve

na

nc

e

Down Move Move

On

line

Op

timiz

er

Interaction

Log

State Machine

Offline Optimizer

DeVIL

Program Reve

nue

Profit

Pric

e

Product

DVMS: Research Prototype

qSpecificationqCompleteness on several interaction taxonomiesqSupport for many novel techniques due to provenance availability

qInteractive debugging of applicationsqDeconstruction and restyling of visualizationsqData explanationqProvenance visualization

qOptimizationqPreliminary results: 20-150% improvements on provenance

materialization latency against state-of-the-art provenance subsystemsqExpecting improvements on interaction latency with novel

materialization options (i.e., data skipping and delta propagation)

Optimizing Interactions through ProvenanceqInstead of going all the way back and

forward, keep mappings (provenance materialization)

Main problems:qMaterializing and indexing provenance

can be expensiveqMany possible options for

materialization. Which one to pick and when?

Static Visualizations

SPLOT = SELECT 8 AS radius, 'gray' AS stroke, 'gray' AS fill, lscale(revenue, sx) AS center_x, lscale(profit, sy) AS center_y,

FROM Sales, sx, sy;

HIST = SELECT 4 AS width,'blue' AS fill,hscale(price, hx) AS height

FROM Sales, hx;

render(SELECT * FROM SPLOT);render(SELECT * FROM HIST);

q Selection of marks: (user event stream ⋈ visual marks)q Linked brushing: Backward trace selection to base data and refresh

the coordinated view based on the traced subset

Rev

enue

Profit

Pric

e

Product

q Data from a database converted to visual marks using SQL queries (i.e., as database views)q Visual marks (encoded as views) are rendered using mark-specific render functions

Specify and optimize interactive visualizationsq Previous approaches: using data visualization systems or manual

implementation using popular toolkits that scale with the Web

q Data visualization systems: limited interactions yet easy to useq Tableau: interactions for OLAP queriesq Crossfilter: linked selection interactions for correlated dimensions

q Manual implementations: hard to optimize, maintain, extend, and reason about

q A declarative, relational approach to interactive visualizationsq Main win: push down optimization and design choices into the

database system with readily available guarantees and optimizations

q DVMS ecosystem: several extensions to the database box for the optimization and design of interactive visualizations

q Provenance subsystem for the specification and optimization of interactive visualizations

q Other extensions: q Asynchronous interactions using concurrency control ideasq Streaming for the optimization of near-interactive visualizationsq Precision interfaces for recommendation of new designsq and many more…

Rev

enue

Profit

Pric

e

Product