visualization-driven data aggregation

15
Visualization-Driven Data Aggregation Presentation of the "M4" Research Paper [1] Uwe Jugel, SAP SE [email protected] VLDB 2014, Hangzhou, China, Sep 4, 2014 [1] U. Jugel, Z. Jerzak, G. Hackenbroich, V. Markl. M4: A Visualization-Oriented Time Series Data Aggregation. Proceedings of the VLDB Endowment 7 (10), 797 - 808

Upload: zbigniew-jerzak

Post on 17-Jul-2015

160 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Visualization-Driven Data Aggregation

Visualization-Driven Data AggregationPresentation of the "M4" Research Paper [1]

Uwe Jugel, SAP [email protected]

VLDB 2014, Hangzhou, China, Sep 4, 2014

[1] U. Jugel, Z. Jerzak, G. Hackenbroich, V. Markl. M4: A Visualization-Oriented Time Series Data Aggregation. Proceedings of the VLDB Endowment 7 (10), 797 - 808

Page 2: Visualization-Driven Data Aggregation

Motivation andOverview

Page 3: Visualization-Driven Data Aggregation

Big Data causes Slow Visual Analytics

big data

datadata

datadata

datadata

range query(visualization-related)

dataengine

raw data

valu

e

time

pixel width: 500 px

Existing BI-tools suffer from long transfertimes caused by high-volume query results.

Consumption of raw datamay cause slow rendering.

2/14

Page 4: Visualization-Driven Data Aggregation

Visualization Scenario

value1234.12223.73197.01154.85

...

...

...

...

...

time1007101810261039

High-Volume Sensor Data- millions of records per day- values present a continuous signal- voltage, velocity, stock prices- potentially large query results (100k+)

Line charts are most common and useful for continuous signals.time

valu

e

line chart

valu

etime

scatter plot

time

bar chart

valu

e

focus

Potential Time Series Visualizations

3/14

Page 5: Visualization-Driven Data Aggregation

1. Model data reduction as query (SQL)2. Preserve visual information: vis(Qreduce(data)) == vis(data)

Goals

lineChart = (data) -> transform data to data_wh draw discrete line pixels for each two points in data_wh

ObservationVisualizations conduct animplicit data reduction byrendering data to pixels

Solution: Visualization-Driven Data Aggregation

VisualizationClient

selected time range

Query RewriterRDBMS

data-reduced query result

visualizationparameters

queryreductionquerydata

data reduction

data flow

+

Solution Architecture

4/14

Page 6: Visualization-Driven Data Aggregation

Research Task: Find a data aggregation modelthat simulates the rasterization process!

Existing approaches: averaging, sampling,line simplification, etc. cannot reproducethe original visualization of the raw data.

vis(avg(data)) vis( (data))vis(data) minmax

original lossyvery lossy

5/14

Page 7: Visualization-Driven Data Aggregation

M4 Principle

Page 8: Visualization-Driven Data Aggregation

c) vis(MinMaxFirstLast(T))

"M4"

a) vis(T)

1 2 3 4

b) vis(MinMax(T))

1

2

3E1

E2

E3

Analysis + Elimination of the Remaining Errors

7/14

Page 9: Visualization-Driven Data Aggregation

M4: Data Aggregation for Perfect Line Charts

vis(M4(data))

lossless

vis(data)

original

==

Theorem 1. "vis(T) == vis(M4(T))"Theorem 2. "error-free line chart from 4*w tuples"Parameters: width, original query Output: "perfect" data subset

Input: big time series data

8/14

Page 10: Visualization-Driven Data Aggregation

WITH Q AS (SELECT t,v FROM sensors WHERE id = 1 AND t >= $t1 AND t <= $t2), QC AS (SELECT count(*) c FROM Q)SELECT * FROM Q WHERE (SELECT c FROM QC) <= 800UNIONSELECT * FROM (

) AS QD WHERE (SELECT c FROM QC) > 800

1) original query Q

2) cardinality query QC3a) use Q if low card.

3b) use QD if high card.

reduction query QD:compute aggregatesfor each pixel-column

Query Rewriting Template

SELECT t,v FROM Q JOIN(SELECT round($w*(t-$t1)/($t2-$t1)) as k, --define key min(v) as v_min, max(v) as v_max, --get min,max min(t) as t_min, max(t) as t_max --get 1st,last FROM Q GROUP BY k) as QA --group by kON k = round($w*(t-$t1)/($t2-$t1)) --join on k AND (v = v_min OR v = v_max OR --&(min|max| t = t_min OR t = t_max) -- 1st|last)

9/14

Page 11: Visualization-Driven Data Aggregation

Evaluation

Page 12: Visualization-Driven Data Aggregation

Performance Measurements

base pa

arou

nd

rando

mfirs

t

minmax M4

base pa

arou

nd

rando

mfirs

t

minmax M4

Main Cost Factorsbaseline query:data reduction queries:

DB-out network bandwidthquery execution time and in-DB memory bandw.

5

50

5

no query cost high transfer cost

low query costno additional transfer cost

11/14

Page 13: Visualization-Driven Data Aggregation

Performance with Increasing Data Volume

80

60

40

20

03,000,0002,000,0001,000,000

tota

l tim

e (s

)

number of rows

t < 5s

near-interactive response times

12/14

Page 14: Visualization-Driven Data Aggregation

Respect the application!1. Take query + presen- tation parameters2. Rewrite query (SQL)3. Fast, In-DB processing

Faster Visual Analytics No information loss+ 10x speed+ 100x bandwidth savings

Conclusion

13/14

Page 15: Visualization-Driven Data Aggregation

Image source by cybaea, https://www.flickr.com/photos/cybaea/54679441/

Uwe Jugel, SAP [email protected]

Visualization-Driven Data Aggregation

VLDB 2014, Hangzhou, China, Sep 1-5, 2014

Presentation of the "M4" Research Paper

Thank you for your attention!

Uwe [email protected]@ubunatic

© 2014 Uwe Jugel, SAP SE. This publication was created in the course of an educational project and does notpresent an actual SAP product or service. No part of this publication may be reproduced or transmitted in anyform or for any purpose without the express permission of their authors. The information contained herein maybe changed without prior notice and comes without any warranty. SAP and the SAP logo are trademarks orregistered trademarks of SAP SE in Germany and in several other countries all over the world.

14/14