DATABASESYSTEMSGROUP
Big Data Management and AnalyticsAssignment 7
1
DATABASESYSTEMSGROUP
Assignment 7-1
(a) Explain each of the terms by providing a short definition
Aggregation: Matching of similar objects to groups and aggregation of theentire group
Compression: Compress received data
Data Reduction: reduce the size of received data
Histograms: Describes a method for approximating frequencydistributions of elements in streams
2
DATABASESYSTEMSGROUP
Assignment 7-1
(a) Explain each of the terms by providing a short definition
Load Shedding: Given the case where the data in the stream arrives withsuch a high velocity that it could overburden the system, some part of thedata will be discarded.
Microclusters: Describes a group of similar objects
Sampling: Selecting a subset of the data
Wavelets: Deconstruct a signal in several coefficients
3
DATABASESYSTEMSGROUP
Assignment 7-1
(b) Illustrate how the terms are related to each other.
4
Data Reduction
SamplingLoad
Shedding
Aggregation
Histograms
Microclusters
Compression Wavelets
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
(a) Perform a Haar Wavelet Transformation on S and determine theWavelet coefficients
Transform S into a sequence of two-component vectors π 1, π1 β¦ π π, ππ
where π πππ=1
2
1 11 β1
π₯ππ₯π+1
5
Vector withcurrent elementof the sequenceand its sucessorelement
Haar matrix
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
(a) Perform a Haar Wavelet Transformation on S and determine theWavelet coefficients
Step1:
π 1 =4+1
2,2+3
2,6+1
2,7+6
2= 2.5, 2.5, 3.5, 6.5
π1 =4β1
2,2β3
2,6β1
2,7β6
2= (1.5, β0.5, 2.5, 0.5)
6
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
Step1:
π 1 =4+1
2,2+3
2,6+1
2,7+6
2= 2.5, 2.5, 3.5, 6.5
π1 =4β1
2,2β3
2,6β1
2,7β6
2= (1.5, β0.5, 2.5, 0.5)
Step2:
π 2 =2.5+2.5
2,3.5+6.5
2= 2.5, 5
π2 =2.5β2.5
2,3.5β6.5
2= (0,β1.5)
7
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
Step2:
π 2 =2.5+2.5
2,3.5+6.5
2= 2.5, 5
π2 =2.5β2.5
2,3.5β6.5
2= (0,β1.5)
Step3:
π 3 =2.5+5
2= 3.75
π3 =2.5β5
2= (β1.25)
8
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
9
Mean Coefficients
(4,1,2,3,6,1,7,6) (-)
(2.5, 2.5, 3.5, 6.5) (1.5, -0.5, 2.5, 0.5)
(2.5, 5) (0, -1.5)
(3.75) (-1.25)
DATABASESYSTEMSGROUP
Assignment 7-2
Given the following input sequence S = (4,1,2,3,6,1,7,6)
(b) Reconstruct the original sequence S using the Wavelet coefficients
For the reconstruction of a sequence S we use:
1 11 β1
β π πππ
= π₯β²ππ₯β²π+1
10
DATABASESYSTEMSGROUP
Assignment 7-2
The wavelet coefficients obtained from (a):
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Loss-free reconstruction:
11
8
7
6
5
4
3
2
1
0
-1
-21 2 3 4 5 6 7 8
DATABASESYSTEMSGROUP
Assignment 7-2
The wavelet coefficients obtained from (a):
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Loss-free reconstruction:
12
8
7
6
5
4
3
2
1
0
-1
-2
1 11 β1
β 3.75β1.25
= 2.55
3.75
5
2.5
β= 1.25
Intuition:
negative sign: start below the base linepositive sign: start above the base line
1 2 3 4 5 6 7 8
Take the middle of each (sub)-plateau!
DATABASESYSTEMSGROUP
Assignment 7-2
The wavelet coefficients obtained from (a):
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Loss-free reconstruction:
13
8
7
6
5
4
3
2
1
0
-1
-2
1 11 β1
β 2.50
= 2.52.5
3.5
5
2.5
β= 1.5
1 2 3 4 5 6 7 8
1 11 β1
β 5β1.5
= 3.56.5
6.5
β= 0
DATABASESYSTEMSGROUP
Assignment 7-2
The wavelet coefficients obtained from (a):
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Loss-free reconstruction:
14
8
7
6
5
4
3
2
1
0
-1
-2
1 11 β1
β 2.51.5
= 41
4
1 2 3 4 5 6 7 8
1 11 β1
β 2.5β0.5
= 23
6
1 11 β1
β 3.52.5
= 61
1 11 β1
β 6.50.5
= 761
2
3
6
1
7
The original sequence is
S = (4,1,2,3,6,1,7,6)
DATABASESYSTEMSGROUP
Assignment 7-2
(c) We assume that all coefficients of value [-0.5,0.5] are close to zero
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Changes to:
DWTβ(S) = (3.75, -1.25, 0, -1.5, 1.5, 0, 2.5, 0)
Reconstructing S with DWTβ(S):
15
1 11 β1
π
β 3.75β1.25
= 2.55
1 11 β1
π
β 2.50
= 2.52.5
1 11 β1
π
β 5β1.5
= 3.56.5
1 11 β1
β 2.51.5
= 41
1 11 β1
β 2.50
= 2.52.5
1 11 β1
β 3.52.5
= 61
1 11 β1
β 6.50
= 6.56.5
DATABASESYSTEMSGROUP
Assignment 7-2
(c) We assume that all coefficients of value [-0.5,0.5] are close to zero
DWT(S) = (3.75, -1.25, 0, -1.5, 1.5, -0.5, 2.5, 0.5)
Changes to:
DWTβ(S) = (3.75, -1.25, 0, -1.5, 1.5, 0, 2.5, 0)
Using DWTβ(S) leads to a loss afflicted reconstruction with:
S = (4,1,2,3,6,1,7,6)
Sβ = (4,1,2.5,2.5,6,1,6.5,6.5)
Now take each residue from S,Sβ and compute their difference and sum upthe differences to the total approximation error:
ππππ(π, πβ²) =
π=0
π β1
π π β πβ²(π)
16
DATABASESYSTEMSGROUP
Assignment 7-2
(c) We assume that all coefficients of value [-0.5,0.5] are close to zero
S = (4,1,2,3,6,1,7,6)
Sβ = (4,1,2.5,2.5,6,1,6.5,6.5)
Now take each residue from S,Sβ and compute their difference and sum upthe differences to the total approximation error:
ππππ(π, πβ²) =
π=0
π β1
π π β πβ²(π)
ππππ π, πβ² = 4 β 4 + 1 β 1 + 2 β 2.5 + 3 β 2.5 + 6 β 6 + 1 β 1 +
7 β 6.5 + 6 β 6.5 = 2
17
DATABASESYSTEMSGROUP
Assignment 7-3
(a) Compute the reduced representation of S using PAA (box size M=4)
Hint: A PAA approximates a time series π of length π with a vector
π = ( π₯1, β¦ , π₯π) of arbitrary length π β€ π, where for each π₯π holds:
π₯π =π
π
π=ππ πβ1 +1
πππ
π₯π
18
DATABASESYSTEMSGROUP
Assignment 7-3
(a) Compute the reduced representation of S using PAA (box size M=4)
position p = (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)
Initial sequence S = (4 ,1 ,2 ,3 ,6 ,1 ,7 ,6)
19
π₯1 =4
8
π=84 1β1 +1=1
841=2
π₯π =(4 + 1)
2= 2.5
π₯2 =4
8
π=84 2β1 +1=3
842=4
π₯π =(2 + 3)
2= 2.5
DATABASESYSTEMSGROUP
Assignment 7-3
(a) Compute the reduced representation of S using PAA (box size M=4)
position p = (1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)
Initial sequence S = (4 ,1 ,2 ,3 ,6 ,1 ,7 ,6)
20
π₯3 =4
8
π=84 3β1 +1=5
843=6
π₯π =(6 + 1)
2= 3.5
π₯4 =4
8
π=84 4β1 +1=7
842=8
π₯π =(7 + 6)
2= 6.5
DATABASESYSTEMSGROUP
Assignment 7-3
(a) Compute the reduced representation of S using PAA (box size M=4)
PAA(S) = (2.5, 2.5, 3.5, 6.5)
21
8
7
6
5
4
3
2
1
0
-1
-2
4
1 2 3 4 5 6 7 8
6
1
2
3
6
1
7
The red linelooks somehowfamiliarβ¦
DATABASESYSTEMSGROUP
Assignment 7-3
(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!
22
8
7
6
5
4
3
2
1
0
-1
-2
4
1 2 3 4 5 6 7 8
8
7
6
5
4
3
2
1
0
-1
-21 2 3 4 5 6 7 8
PAA DWT
DATABASESYSTEMSGROUP
Assignment 7-3
(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!
23
Given the case that the number of coefficients of the DWT is a power of twoit is always possible to convert between the representations (PAA and Haar)
Source: Keogh E. et. Al. - Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases; KAIS Long paper (2000)
DATABASESYSTEMSGROUP
Assignment 7-3
(b) Convince yourself that PAA and DWT (using Haar Wavelets as basisfunction!) are equivalent!
24
Mean Coefficients
(4,1,2,3,6,1,7,6) (-)
(2.5, 2.5, 3.5, 6.5) (1.5, -0.5, 2.5, 0.5)
(2.5, 5) (0, -1.5)
(3.75) (-1.25)
DATABASESYSTEMSGROUP
Assignment 7-4
Given a data stream of size π. Randomly select π β€ π elements from thestream. Here π represents the size of the reservoir.
(a) Setting π = 1,π = 2. The first element is in the reservoir, the second isnot. What is the probability of both elements to be in the reservoir?
25
A, B
π = 1 π = 2Keep firstitem in memory
BA
ππππ€ =1
π=1
2
To keep the newitem and removethe old one
ππππ = 1 β1
π=1
2To keep the olditem A and ignorethe new one
DATABASESYSTEMSGROUP
Assignment 7-4
Given a data stream of size π. Randomly select π β€ π elements from thestream. Here π represents the size of the reservoir.
(b) Setting π = 1,π = 3. What is now the probability for each of theelements to be in the reservoir?
26
A, B, C
π = 1 π = 3Keep firstitem in memory
B, CA
ππππ =1
2(1 β
1
3) =1
2β2
3=1
3
ππππ€ =1
π=1
3
DATABASESYSTEMSGROUP
Assignment 7-4
Given a data stream of size π. Randomly select π β€ π elements from thestream. Here π represents the size of the reservoir.
(c) Setting π = 1. What is the probability for any given π?
27
A, B, C, β¦
π = 1 π = πKeep firstitem in memory
B, C, β¦A
ππππ =1
2β2
3β3
4β¦ βπβ2
πβ1βπβ1
π=1
π
DATABASESYSTEMSGROUP
Assignment 7-4
Given a data stream of size π. Randomly select π β€ π elements from thestream. Here π represents the size of the reservoir.
(d) What is the probability for an arbitrary reservoir size π and an abitrarystream size π?
28
A, B, C, β¦
π = ? π = πKeep firstk items in memory
β¦
ππππ =π
π+1βπ+1
π+2βπ+2
π+3β¦ βπ+πβπβ1
π=π
π
A, β¦ ππππ€ =π
π