efficient computation of parsimonious temporal aggregationdignoes/res/adbis2015-slides.pdf ·...
TRANSCRIPT
Efficient Computation of Parsimonious TemporalAggregation
Giovanni Mahlknecht, Anton Dignos, Johann Gamper
Free University of Bozen-Bolzano, Italy
ADBIS 2015September 8-11, 2015 - Futuroscope, Poitiers, France
ADBIS 2015 1/23 G. Mahlknecht et al.
Outline
Introduction
Diagonal Pruning
Split Point Graph
Experimental Evaluation
ADBIS 2015 2/23 G. Mahlknecht et al.
Instant Temporal Aggregation (ITA)
Patient treatment periods with daily cost
P C Tr1 Bob 600 [1,4]r2 Mary 400 [1,2]r3 Eve 40 [3,3]r4 Eric 310 [3,4]r5 Joe 30 [6,6]r6 John 300 [6,9]r7 Alex 100 [9,9]
.
days1 2 3 4 5 6 7 8 9
Eve 40 Joe 30 Alex 100
Eric 310 John 300
Mary 400
Bob 600
ITA: at each timepoint SUM(C)Val T
s1 1000 [1,2]s2 950 [3,3]s3 910 [4,4]s4 330 [6,6]s5 300 [7,8]s6 400 [9,9]
.
days1 2 3 4 5 6 7 8 9
1000 950 910 300330 400
ADBIS 2015 3/23 G. Mahlknecht et al.
Instant Temporal Aggregation (ITA)
Patient treatment periods with daily cost
P C Tr1 Bob 600 [1,4]r2 Mary 400 [1,2]r3 Eve 40 [3,3]r4 Eric 310 [3,4]r5 Joe 30 [6,6]r6 John 300 [6,9]r7 Alex 100 [9,9]
.
days1 2 3 4 5 6 7 8 9
Eve 40 Joe 30 Alex 100
Eric 310 John 300
Mary 400
Bob 600
ITA: at each timepoint SUM(C)Val T
s1 1000 [1,2]s2 950 [3,3]s3 910 [4,4]s4 330 [6,6]s5 300 [7,8]s6 400 [9,9]
.
days1 2 3 4 5 6 7 8 9
1000 950 910 300330 400
ADBIS 2015 3/23 G. Mahlknecht et al.
Parsimonious Temporal Aggregation (PTA)
Input: ITA resultOutput: merged tuples to size c with minimum error
Rules:
I Merge only adjacent tuples
I Merged values are weighted mean
I Error is Squared Sum Error
I Result is of size c
.
days1 2 3 4 5 6 7 8 9
1000
950
910 300
330 400
value = 983.33, error = 1667 value = 333.33, error = 6667
ADBIS 2015 4/23 G. Mahlknecht et al.
Parsimonious Temporal Aggregation (PTA)
Input: ITA resultOutput: merged tuples to size c with minimum error
Rules:
I Merge only adjacent tuples
I Merged values are weighted mean
I Error is Squared Sum Error
I Result is of size c
.
days1 2 3 4 5 6 7 8 9
1000
950
910 300
330 400
value = 332.5, error = 6675
ADBIS 2015 4/23 G. Mahlknecht et al.
PTA Optimal Solution
Result of ITA for SUM(C) (size n = 6)
.
days1 2 3 4 5 6 7 8 9
1000
950
910 300
330 400
error = 800 error = 600
Result PTA (c = 4) total error 1,400 (optimal solution)
.
days1 2 3 4 5 6 7 8 9
1, 000
930 310
400
ADBIS 2015 5/23 G. Mahlknecht et al.
Split Points / Split PathI PTA computes a split path (sequence of split points)I Tuples between split points are merged
.
days1 2 3 4 5 6 7 8 9
1000
950
910 300
330 400
error = 800 error = 600
split 5split 3split 1
Split Path: [1, 3, 5]
.
days1 2 3 4 5 6 7 8 9
1, 000
930 310
400
ADBIS 2015 6/23 G. Mahlknecht et al.
PTA Existing Algorithm
Dynamic Programming Algorithm
Error Matrix Ei=1 2 3 4 5 6
k=1 0 1667 5700 ∞ ∞ ∞2 - 0 800 5700 6300 12375
3 - - 0 800 1400 6300
4 - - - 0 600 1400
Ei,k minimum error in reducing the firsti tuples to size k
Split Point Matrix Ji=1 2 3 4 5 6
k=1 0 0 0 0 0 0
2 0 1 1 3 3 3
3 0 0 2 3 3 5
4 0 0 0 3 3 5
Ji,k optimum split pointwhen reducing the first ituples to size k
only the last two rows are used whole matrix
ADBIS 2015 7/23 G. Mahlknecht et al.
Problem and Contribution
Problem
I Runtime and space requirements of existing algorithm notscalable
Contribution
I Diagonal Pruning: Reduces the computational complexity byavoiding unnecessary computations
I Split Point Graph: Reduces the space complexity
I Result remains optimal
ADBIS 2015 8/23 G. Mahlknecht et al.
Introduction
Diagonal Pruning
Split Point Graph
Experimental Evaluation
ADBIS 2015 9/23 G. Mahlknecht et al.
Diagonal Pruning
Lemma (Diagonal Pruning)
For the computation of the error matrix E and split point matrix Jthere exists an upper bound for variable i.
i=1 2 3 4 5 6
k=1 0 0 0 0 0 02 - 1 1 3 3 33 - - 2 3 3 54 - - - 3 3 5
I Red cells can be avoided, reduces runtime
I allows to eliminate parts of the matrices, reduces memory
ADBIS 2015 10/23 G. Mahlknecht et al.
Introduction
Diagonal Pruning
Split Point Graph
Experimental Evaluation
ADBIS 2015 11/23 G. Mahlknecht et al.
Split Point Graph
Challenge
Substitution of Split Point Matrix J by alternative structure toreduce memory consumption
Ideaunnecessary nodes are not stored
Split Point Graph
I Only necessary nodes are inserted
I Nodes are removed when they become obsolete (PathPruning)
ADBIS 2015 12/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
ADBIS 2015 13/23 G. Mahlknecht et al.
Graph Evolution
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
2 diagonal pruned node
1 active path pruned nodes
1 path pruned nodes
Total number of nodes: 24
Not computed nodes: 12
Path pruned nodes: 4
ADBIS 2015 13/23 G. Mahlknecht et al.
Introduction
Diagonal Pruning
Split Point Graph
Experimental Evaluation
ADBIS 2015 14/23 G. Mahlknecht et al.
Experimental Configuration
Synthetic Datasets
I SYNTH: random distributed values
I ETDS: evolution of employees in a company
Algorithm Comparisons
I PTA: original Algorithm
I DP: PTA with diagonal pruning
I SGP: Split point graph with diagonal and path pruning
ADBIS 2015 15/23 G. Mahlknecht et al.
Runtime: PTA vs Diagonal Pruning
0 1 2 3 4 50
5
10
Reduction size [k]
Ru
nti
me
[sec
]
ETDS
PTA
DP
0 1 2 3 4 50
50
100
150
Reduction size [k]
Ru
nti
me
[sec
]
SYNTH
PTA
DP
Diagonal pruning substantially reduces runtime
ADBIS 2015 16/23 G. Mahlknecht et al.
Runtime: Split Point Graph vs PTA with Diagonal Pruning
0 1 2 3 4 50
2
4
6
Reduction size [k]
Ru
nti
me
[sec
]
ETDS
SPG
DP
0 1 2 3 4 50
50
100
150
Reduction size [k]
Ru
nti
me
[sec
]
SYNTH
SPG
DP
The overhead of the dynamic graph structure and pathpruning is very small
ADBIS 2015 17/23 G. Mahlknecht et al.
Space Efficiency: PTA vs SPG (compression to 10%)
0 1 2 3 4 5 6 7 8 9100
20
40
60
80
Input cardinality n [k]
Mem
ory
[MB
]
ETDS
PTA, c=10%
SPG, c=10%
0 1 2 3 4 5 6 7 8 9100
20
40
60
80
Input cardinality n [k]
Mem
ory
[MB
]
SYNTH
PTA, c=10%
SPG, c=10%
Graph Implementation with Diagonal Pruning and PathPruning substantially reduces space consumption
ADBIS 2015 18/23 G. Mahlknecht et al.
Space Efficiency: PTA vs SPG (compression to 1%)
0 1 2 3 4 5 6 7 8 9100
2
4
6
8
Input cardinality n [k]
Mem
ory
[MB
]
ETDS
PTA, c=1%
SPG, c=1%
0 1 2 3 4 5 6 7 8 9100
2
4
6
8
Input cardinality n [k]
Mem
ory
[MB
]
SYNTH
PTA, c=1%
SPG, c=1%
Graph Implementation with Diagonal Pruning and PathPruning substantially reduces space consumption
ADBIS 2015 19/23 G. Mahlknecht et al.
Space Efficiency: Effect of Path Pruning
0 1 2 3 4 50
50
100
150
200
Reduction size c [103]
Mem
ory
[MB
]ETDS
SPG without Path Pruning
PTA
SPG
0 1 2 3 4 50
50
100
150
200
Reduction size c [103]
Mem
ory
[MB
]
SYNTH
SPG without Path Pruning
PTA
SPG
Path Pruning has a huge pruning effect. It prunes about 2/3of the graph
ADBIS 2015 20/23 G. Mahlknecht et al.
Related Work
I Tuma, P.: Implementing Historical Aggregates in TempIS.Ph.D. thesis, Wayne State University, Detroit, Michigan(1992)
I Kline, N., Snodgrass, R.T.: Computing temporal aggregates.In: ICDE. pp. 222-231 (1995)
I Moon, B., Vega Lopez, I.F., Immanuel, V.: Efficientalgorithms for large-scale temporal aggregation. IEEE Trans.Knowl. Data Eng. 15(3), 744-759 (2003)
I Tao, Y., Papadias, D., Faloutsos, C.: Approximate temporalaggregation. In: ICDE. pp. 190-201 (2004)
I Gordevicius, J., Gamper, J., Bohlen, M.H.: Parsimonioustemporal aggregation. VLDB J. 21(3), 309-332 (2012)
ADBIS 2015 21/23 G. Mahlknecht et al.
Conclusion
I Diagonal Pruning reduces the runtime of the computationby reducing the search space of the DP scheme adopted byPTA
I Split Point Graph in combination with Path Pruning reducesmemory consumption
I Experiments showed that the two optimizations reducememory requirements to one third of the original PTAimplementation
ADBIS 2015 22/23 G. Mahlknecht et al.
Future Work
I Generalization of split point graph for some classes of DPproblems
I Computation of PTA queries while the ITA result is computedavoiding two step computation
ADBIS 2015 23/23 G. Mahlknecht et al.