visualization-driven data aggregation
TRANSCRIPT
Visualization-Driven Data AggregationPresentation of the "M4" Research Paper [1]
Uwe Jugel, SAP [email protected]
VLDB 2014, Hangzhou, China, Sep 4, 2014
[1] U. Jugel, Z. Jerzak, G. Hackenbroich, V. Markl. M4: A Visualization-Oriented Time Series Data Aggregation. Proceedings of the VLDB Endowment 7 (10), 797 - 808
Motivation andOverview
Big Data causes Slow Visual Analytics
big data
datadata
datadata
datadata
range query(visualization-related)
dataengine
raw data
valu
e
time
pixel width: 500 px
Existing BI-tools suffer from long transfertimes caused by high-volume query results.
Consumption of raw datamay cause slow rendering.
2/14
Visualization Scenario
value1234.12223.73197.01154.85
...
...
...
...
...
time1007101810261039
High-Volume Sensor Data- millions of records per day- values present a continuous signal- voltage, velocity, stock prices- potentially large query results (100k+)
Line charts are most common and useful for continuous signals.time
valu
e
line chart
valu
etime
scatter plot
time
bar chart
valu
e
focus
Potential Time Series Visualizations
3/14
1. Model data reduction as query (SQL)2. Preserve visual information: vis(Qreduce(data)) == vis(data)
Goals
lineChart = (data) -> transform data to data_wh draw discrete line pixels for each two points in data_wh
ObservationVisualizations conduct animplicit data reduction byrendering data to pixels
Solution: Visualization-Driven Data Aggregation
VisualizationClient
selected time range
Query RewriterRDBMS
data-reduced query result
visualizationparameters
queryreductionquerydata
data reduction
data flow
+
Solution Architecture
4/14
Research Task: Find a data aggregation modelthat simulates the rasterization process!
Existing approaches: averaging, sampling,line simplification, etc. cannot reproducethe original visualization of the raw data.
vis(avg(data)) vis( (data))vis(data) minmax
original lossyvery lossy
5/14
M4 Principle
c) vis(MinMaxFirstLast(T))
"M4"
a) vis(T)
1 2 3 4
b) vis(MinMax(T))
1
2
3E1
E2
E3
Analysis + Elimination of the Remaining Errors
7/14
M4: Data Aggregation for Perfect Line Charts
vis(M4(data))
lossless
vis(data)
original
==
Theorem 1. "vis(T) == vis(M4(T))"Theorem 2. "error-free line chart from 4*w tuples"Parameters: width, original query Output: "perfect" data subset
Input: big time series data
8/14
WITH Q AS (SELECT t,v FROM sensors WHERE id = 1 AND t >= $t1 AND t <= $t2), QC AS (SELECT count(*) c FROM Q)SELECT * FROM Q WHERE (SELECT c FROM QC) <= 800UNIONSELECT * FROM (
) AS QD WHERE (SELECT c FROM QC) > 800
1) original query Q
2) cardinality query QC3a) use Q if low card.
3b) use QD if high card.
reduction query QD:compute aggregatesfor each pixel-column
Query Rewriting Template
SELECT t,v FROM Q JOIN(SELECT round($w*(t-$t1)/($t2-$t1)) as k, --define key min(v) as v_min, max(v) as v_max, --get min,max min(t) as t_min, max(t) as t_max --get 1st,last FROM Q GROUP BY k) as QA --group by kON k = round($w*(t-$t1)/($t2-$t1)) --join on k AND (v = v_min OR v = v_max OR --&(min|max| t = t_min OR t = t_max) -- 1st|last)
9/14
Evaluation
Performance Measurements
base pa
arou
nd
rando
mfirs
t
minmax M4
base pa
arou
nd
rando
mfirs
t
minmax M4
Main Cost Factorsbaseline query:data reduction queries:
DB-out network bandwidthquery execution time and in-DB memory bandw.
5
50
5
no query cost high transfer cost
low query costno additional transfer cost
11/14
Performance with Increasing Data Volume
80
60
40
20
03,000,0002,000,0001,000,000
tota
l tim
e (s
)
number of rows
t < 5s
near-interactive response times
12/14
Respect the application!1. Take query + presen- tation parameters2. Rewrite query (SQL)3. Fast, In-DB processing
Faster Visual Analytics No information loss+ 10x speed+ 100x bandwidth savings
Conclusion
13/14
Image source by cybaea, https://www.flickr.com/photos/cybaea/54679441/
Uwe Jugel, SAP [email protected]
Visualization-Driven Data Aggregation
VLDB 2014, Hangzhou, China, Sep 1-5, 2014
Presentation of the "M4" Research Paper
Thank you for your attention!
Uwe [email protected]@ubunatic
© 2014 Uwe Jugel, SAP SE. This publication was created in the course of an educational project and does notpresent an actual SAP product or service. No part of this publication may be reproduced or transmitted in anyform or for any purpose without the express permission of their authors. The information contained herein maybe changed without prior notice and comes without any warranty. SAP and the SAP logo are trademarks orregistered trademarks of SAP SE in Germany and in several other countries all over the world.
14/14