faculty of computer science, institute system architecture, database technology group
DESCRIPTION
Faculty of Computer Science, Institute System Architecture, Database Technology Group. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/1.jpg)
A Dip in the Reservoir: Maintaining Sample Synopses
of Evolving Datasets
Rainer Gemulla (University of Technology Dresden)Wolfgang Lehner (University of Technology Dresden)
Peter J. Haas (IBM Almaden Research Center)
Faculty of Computer Science, Institute System Architecture, Database Technology Group
![Page 2: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/2.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 2(VLDB 2006)
Outline
1. Introduction
2. Deletions
3. Resizing
4. Experiments
5. Summary
![Page 3: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/3.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 3(VLDB 2006)
Random Sampling
• Database applications– huge data sets– complex algorithms
(space & time)
• Requirements– performance, performance, performance
• Random sampling– approximate query answering – data mining – data stream processing– query optimization – data integration
Turnover in Europe (TPC-H)
1% 8.46 Mil. 0.15 Mil. 4s
10% 8.51 Mil. 0.05 Mil. 52s
100% 8.54 Mil. 200s
![Page 4: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/4.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 4(VLDB 2006)
The Problem Space
• Setting– arbitrary data sets– samples of the data– evolving data
• Scope of this talk– maintenance of
random samples
Can we minimize or even avoid access to base data?
Apply
D
Apply
Compute
Data Sample
![Page 5: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/5.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 5(VLDB 2006)
Types of Data Sets
• Data sets– variation of data set size– influence on sampling
Stable
Goal: stable sample
Growing
Goal: controlled
growing sample
Shrinking
uninteresting
![Page 6: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/6.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 6(VLDB 2006)
Uniform Sampling
• Uniform sampling– all samples of the same size are equally likely– many statistical procedures assume uniformity– flexibility
• Example– a data set (also called population)
– possible samples of size 2
1 2 3 4
1 2 1 3 1 4 2 3 2 4 3 4
16% 16% 16% 16% 16% 16%
![Page 7: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/7.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 7(VLDB 2006)
Reservoir Sampling
• Reservoir sampling– computes a uniform sample of M elements – building block for many sophisticated sampling schemes
– single-scan algorithm• add the first M elements• afterwards, flip a coin
a) ignore the element (reject) b) replace a random element in the sample (accept)
– accept probability of the ith element
i
MtP i
size population
size sample)accepted is (
![Page 8: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/8.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 8(VLDB 2006)
Reservoir Sampling (Example)
1 2+t1 +t2100%
• Example– sample size M = 2
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t333% 33% 33%
1 2
1 2 4 2 1 4
1 2
1/3
2/4 1/4 1/4
3 2 4 2 3 4
3 2
2/4 1/4 1/4
1 3 4 3 1 4
1 3
2/4 1/4 1/4
1/3 1/3
+t1 +t2
+t3
+t416% 8% 8% 8% 8% 8% 8%16% 16%
![Page 9: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/9.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 9(VLDB 2006)
Problems with Reservoir Sampling
• Problems with reservoir sampling– lacks support for deletions (stable data sets)– cannot efficiently enlarge sample (growing data sets)
?
![Page 10: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/10.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 10(VLDB 2006)
Outline
1. Introduction
2. Deletions
3. Resizing
4. Experiments
5. Summary
![Page 11: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/11.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 11(VLDB 2006)
Naïve/Prior Approaches
unstableconduct deletions, continue with smaller sample
(RS with deletions)
CommentsTechniqueAlgorithm
expensive, low space efficiency in our setting
tailored for multiset populations Distinct-value sampling
special case of our RP algorithm
developed for data streams (sliding windows only)
Passive sampling
inexpensive but unstable
“coin flip” sampling with deletions, purge if too large
Bernoulli s. with purging
stable but expensiveimmediately sample from base data to refill the sample
CAR(WOR)
expensive, unstablelet sample size decrease, but occasionally recompute
Backing sample
not uniformuse insertions to immediately refill the sample
Naïve
![Page 12: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/12.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 12(VLDB 2006)
Random Pairing
• Random pairing– compensates deletions with arriving insertions – corrects inclusion probabilies
• General idea (insertion)– no uncompensated deletions reservoir sampling– otherwise,
• randomly select an uncompensated deletion (partner)• compensate it: Was it in the sample?
– yes add arriving element to sample– no ignore arriving element
![Page 13: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/13.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 13(VLDB 2006)
Random Pairing
• Example
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t3
-t2 1 3 1 3
-t3 1 1
1
1
1
1
1
1
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t3
-t2 1 3 1 3
-t3 1 1
+t4
1
1
1
1
1
1
1 1 4
1/2 1/2
4 4
1/2 1/2
1 4 1
1/2 1/2
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t3
-t2 1 3 1 3
-t3 1 1
+t4
1
1
1
1
1
1
+t5
1 1 4
1/2 1/2
1 4 1
1/2 1/2
4 4
1/2 1/2
1 5
1
1 4
1
1 5
1
1 4
1
4 5
1
4 5
1
16% 16% 16% 16%16% 16%
![Page 14: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/14.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 14(VLDB 2006)
Random Pairing
• Details of the algorithm– keeping history of deleted items is expensive, but:
– maintenance of two counters suffices– correctness proof is in the paper
d
c
PtP i
1
deletions teduncompensa#
samplein deletions teduncompensa#sample)in spartner wa random()added is (
![Page 15: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/15.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 15(VLDB 2006)
Outline
1. Introduction
2. Deletions
3. Resizing
4. Experiments
5. Summary
![Page 16: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/16.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 16(VLDB 2006)
Growing Data Sets
• The problem– growing data set
Data set
growing data set
Random pairing
stable samplesampling fraction
decreases
![Page 17: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/17.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 17(VLDB 2006)
A Negative Result
• Negative result– There is no resizing algorithm which can enlarge a bounded-
size sample without ever accessing base data.
• Example– data set
– samples of size 2
– new data set
– samples of size 3
1 2 3 4
1 2 1 3 1 4 2 3 2 4 3 4
16% 16% 16% 16% 16% 16%
1 2 3 1 2 5
0% >0%Not uniform!
1 2 3 4 5 6 ...
![Page 18: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/18.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 18(VLDB 2006)
Resizing
• Goal– efficiently increase sample size– stay within an upper bound at all times
• General idea1. convert sample to Bernoulli sample2. continue Bernoulli sampling until new sample size is
reached3. convert back to reservoir sample
• Optimally balance cost– cost of base data accesses (in step 1) – time to reach new sample size (in step 2)
![Page 19: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/19.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 19(VLDB 2006)
Resizing
• Bernoulli sampling– uniform sampling scheme– each tuple is added to the sample with probability q– sample size follows binomial distribution no effective
upper bound
• Phase 1: Conversion to a Bernoulli sample– given q, randomly determine sample size– reuse reservoir sample to create Bernoulli sample
• subsample• sample additional tuples (base data access)
– choice of q• small less base data accesses• large more base data accesses
![Page 20: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/20.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 20(VLDB 2006)
Resizing
• Phase 2: Run Bernoulli sampling– accept new tuples with probability q– conduct deletions– stop as soon as new sample size is reached
• Phase 3: Revert to Reservoir sampling– switchover is trivial
• Choosing q– determines cost of Phase 1 and Phase 2– goal: minimize total cost
• base data access expensive small q• base data access cheap large q
– details in paper
![Page 21: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/21.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 21(VLDB 2006)
Resizing
• Example– resize by 30% if sampling fraction drops below 9%– dependent on costs of accessing base data
Low costs
immediate resizing
Moderate costs
combined solution
High costs
degenerates to Bernoulli sampling
![Page 22: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/22.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 22(VLDB 2006)
Outline
1. Introduction
2. Deletions
3. Resizing
4. Experiments
5. Summary
![Page 23: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/23.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 23(VLDB 2006)
Total Cost
• Total cost– stable dataset, 10M operations– sample size 100k, data access 10 times more expensive
than sample access
Base data access
No base data access
![Page 24: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/24.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 24(VLDB 2006)
Sample size
• Sample size– stable dataset, size 1M– sample size 100k
Base data access
No base data access
![Page 25: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/25.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 25(VLDB 2006)
Outline
1. Introduction
2. Deletions
3. Resizing
4. Experiments
5. Summary
![Page 26: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/26.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 26(VLDB 2006)
Summary
• Reservoir Sampling– lacks support for deletions– complete recomputation to enlarge the sample
• Random Pairing– uses arriving insertions to compensate for deletions
• Resizing– base data access cannot be avoided– minimizes total cost
• Future work– better q for resizing– combine with existing techniques [4,8,17] to enhance
flexibility, scalability
![Page 27: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/27.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 27(VLDB 2006)
Thank you!
Questions?
![Page 28: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/28.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 28(VLDB 2006)
Backup: Bounded-Size Sampling
• Why sampling?– performance, performance, performance
• How much to sample?– influencing factors
1. storage consumption2. response time3. accuracy
– choosing the sample size / sampling fraction1. largest sample that meets storage requirements2. largest sample that meets response time requirements3. smallest sample that meets accuracy requirements
![Page 29: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/29.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 29(VLDB 2006)
Backup: Bounded-Size Sampling
• Example– random pairing vs. bernoulli sampling– average estimation
Data set Sample size
BS violates 1, 2
Standard error
BS violates 3
N
n
n
Var1
![Page 30: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/30.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 30(VLDB 2006)
Backup: Distinct-Value Sampling
• Distinct-value sampling (optimistic setting for DV)– DV-scheme knows avg. dataset size in advance– assume no storage for counters & hash functions
Sample size
RP has better memory utilization
Execution time
RP is significantly faster
10%
10%
0% 10%0%10ms
100ms
1s
10s
100s
1000s
![Page 31: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/31.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 31(VLDB 2006)
Backup: RS With Deletions
• Reservoir sampling with deletions– conduct deletions, continue with smaller sample size
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t3
-t2 1 3 1 3
-t3 1 1
+t4
1 5 4 5
1 4
1
1
1
1
1
1
1 1/2 1/2
2/3 1/3 2/3 1/3
1 5 4 5
1 4
1/2 1/2
2/3 1/3 2/3 1/3
+t5
1
11% 5,5% 11% 33%5,5% 11% 5,5% 11% 5,5%
![Page 32: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/32.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 32(VLDB 2006)
Backup: Backing Sample
• Evaluation– data set consists of 1 million elements (on average)– 100k sample, clustered insertions/deletions
Data set
stable
Reservoir sampling
sample is empty eventually
Backing sample
expensive, unstable
![Page 33: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/33.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 33(VLDB 2006)
1 2
1 2
1/3
3 2 1 3
1/3 1/3
+t1 +t2
+t3
-t2 1 3 1 3
-t3 1 1
+t4 4
4 5
1
1
1
1
1
1
1
+t5
1
1 4
1
1 4
1
1 4 5 4 1 5
1/3 1/3 1/3
1 4 5 4 1 5
1/3 1/3 1/3
11% 11% 11% 33% 11% 11% 11%
Backup: An Incorrect Approach
• Idea– use arriving insertions to refill the sample
Not uniform!
![Page 34: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/34.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 34(VLDB 2006)
Backup: Random Pairing
• Evaluation– data set consists of 1 million elements (on average)– 100k sample, clustered insertions/deletions
Data set
stable
Reservoir sampling
sample gets emtpy eventually
Random pairing
no base data access!
![Page 35: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/35.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 35(VLDB 2006)
Backup: Average Sample Size
• Average sample size– stable dataset, 10M operations– sample size 100k
![Page 36: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/36.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 36(VLDB 2006)
Backup: Average Sample Size With Clustered Insertions/Deletions
• Average sample size with clustered insertions/deletions– stable dataset, size 10M, ~8M operations– sample size 100k
![Page 37: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/37.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 37(VLDB 2006)
Backup: Cost
• Cost– stable dataset, 10M operations– sample size 100k
![Page 38: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/38.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 38(VLDB 2006)
Backup: Cost With Clustered Insertions/Deletions
• Cost with clustered insertions/deletions– stable dataset, size 10M, ~8M operations– sample size 100k
![Page 39: Faculty of Computer Science, Institute System Architecture, Database Technology Group](https://reader035.vdocument.in/reader035/viewer/2022062321/56812a5f550346895d8dcdf1/html5/thumbnails/39.jpg)
Rainer Gemulla, Wolfgang Lehner, Peter J. Haas
A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets
Slide 39(VLDB 2006)
Backup: Resizing (Value of q)
• Resizing– enlarge sample from 100k to 200k– base data access 10ms, arrival rate 1ms