the texture benchmark: measuring performance of text queries on a relational dbms
DESCRIPTION
The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS. Vuk Ercegovac David J. DeWitt Raghu Ramakrishnan. Applications Combining Text and Relational Data. Query :. SELECT SCORE, P.id, FROM Products P WHERE P.type = ‘PDA’ and - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/1.jpg)
1
The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS
Vuk Ercegovac
David J. DeWitt
Raghu Ramakrishnan
![Page 2: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/2.jpg)
2
Applications Combining Text and Relational Data
Query:
How should such an application be expected to perform?
Score P.id
0.9 123
0.87 987
0.82 246
… …
SELECT SCORE, P.id,FROM Products PWHERE P.type = ‘PDA’ and CONTAINS(P.complaint, ‘short battery life’, SCORE)ORDER BY SCORE DESC
ProductComplaints
![Page 3: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/3.jpg)
3
Possibilities for Benchmarking
Measure
WorkloadQuality
Response Time/
Throughput
Relational N/ATPC[3], AS3AP[10],
Set Query[8]
Text TREC[2], VLC2[1] FTDR[4], VLC2[1]
Relational + Text
?? TEXTURE
1. http://es.csiro.au/TRECWeb/vlc2info.html2. http://trec.nist.gov3. http://www.tpc.org4. S. DeFazio, Full-text Document Retrieval Benchmark, chapter 8. Morgan Kaufman, 2 edition, 19938. P. O’Neil. The Set Query Benchmark. The Benchmark Handbook, 199110. C. Turbyfill, C. Orji, and D. Bitton. AS3AP- a Comparative Relational Database Benchmark. IEEE Compcon, 1989.
![Page 4: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/4.jpg)
4
Contributions of TEXTURE
Design micro-benchmark to compare response time using a mixed relational + text query workload
Develop TextGen to synthetically grow a text collection given a real text collection
Evaluate TEXTURE on 3 commercial systems
![Page 5: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/5.jpg)
5
Why a Micro-benchmark Design?
A fine level of control for experiments is needed to differentiate effects due to: How text data is stored How documents are assigned a score Optimizer decisions
![Page 6: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/6.jpg)
6
Why use Synthetic Text?
Allows for systematic scale-up User’s current data set may be too small
Users may be more willing to share synthetic data
Measurements on synthetic data shown empirically by us to be close to same measurements on real data
![Page 7: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/7.jpg)
7
A Note on Quality
Measuring quality is important! Easy to quickly return poor results
We assume that the three commercial systems strive for high quality results Some participated at TREC Large overlap between result sets
![Page 8: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/8.jpg)
8
Outline
TEXTURE Components Evaluation Synthetic Text Generation
![Page 9: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/9.jpg)
9
System A
TEXTURE Components
Relational Text Attributes
DBGen TextGen
System B Response Time AResponse Time B
num_id num_u num_05 num_5 num_50 txt_short txt_long
pkey un-clustered indexes display body
QueryGen
Query 1Query 2…Query n
Query Templates
![Page 10: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/10.jpg)
10
Overview of Data
Schema based on Wisconsin Benchmark [5] Used to control relational predicate selectivity
Relational attributes populated by DBGen [6] Text attributes populated by TextGen (new)
Input: D: document collection, m: scale-up factor
Output: D’: document collection with |D| x m documents Goal: Same response times for workloads on D’ and
corresponding real collection
5. D. DeWitt. The Wisconsin Benchmark: Past, Present, and Future. The Benchmark Handbook, 1991.6. J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger. Quickly Generating Billion-record Synthetic Databases. ACM SIGMOD, 1994
![Page 11: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/11.jpg)
11
Overview of Queries
Query workloads derived from query templates with following parameters
Text expressions: Vary number of keywords, keyword selectivity, and
type of expression (i.e., phrase, Boolean, etc.) Keywords chosen from text collection
Relational expression: Vary predicate selectivity, join condition selectivity
Sort order: Choose between relational attribute or score
Retrieve ALL or TOP-K results
![Page 12: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/12.jpg)
12
Example Queries
SELECT SCORE, num_id, txt_shortFROM RWHERE NUM_5 = 3 and CONTAINS(R.txt_long, ‘foo bar’, SCORE)ORDER BY SCORE DESC
SELECT S.SCORE, S.num_id, S.txt_shortFROM R, SWHERE R.num_id = S.num_id and S.NUM_05 = 2 and CONTAINS(S.txt_long, ‘foo bar’, S.SCORE)ORDER BY S.SCORE DESC
Example of a single relation, mixed relational and text query that sorts according to a relevance score.
Example of a join query, sorting according to a relevance score on S.txt_long.
![Page 13: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/13.jpg)
13
Outline
TEXTURE Components Evaluation Synthetic Text Generation
![Page 14: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/14.jpg)
14
Overview of Experiments
How is response time affected as the database grows in size?
How is response time affected by sort order and top-k optimizations?
How do the results change when input collection to TextGen differs?
![Page 15: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/15.jpg)
15
Data and Query Workloads
TextGen input is TREC AP Vol.1[1] and VLC2 [2] Output: relations w/ {1, 2.5, 5, 7.5, 10} x 84,678 tuples Corresponds to ~250 MB to 2.5 GB of text data
Text-only queries: Low (< 0.03%) vs. high selectivity (< 3%) Phrases, OR, AND
Mixed, single relation queries: Low (<0.01%) vs. high selectivity (5%) Pair with all text-only queries
Mixed, multi relation queries: 2, 3 relations, vary text attribute used, vary selectivity
Each query workload consists of 100 queries
1. http://es.csiro.au/TRECWeb/vlc2info.html2. http://trec.nist.gov
![Page 16: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/16.jpg)
16
Methodology for Evaluation
Setup database and query workloads Run workload per system multiple times
to obtain warm numbers Discard first run, report average of
remaining Repeat for all systems (A, B, C) Platform: Microsoft Windows 2003
Server, dual processor 1.8 GHz AMD, 2 GB of memory, 8 120 GB IDE drives
![Page 17: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/17.jpg)
17
Scaling: Text-Only Workloads
How does response time vary per system as the data set scales up? Query workload: low text selectivity (0.03%) Text data: synthetic based on TREC AP Vol. 1
0
10
20
30
40
50
60
1 2.5 5 7.5 10
Scale Factor
Sec
onds
System A
System BSystem C
![Page 18: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/18.jpg)
18
Mixed Text/Relational Workloads
Workload
SystemLow
A 2.8
B 30
C 2.6
High
71
140
28
Drill down on scale factor 5 (~450K tuples) Query workload Low: text selectivity (0.03%) Query workload High: text selectivity (3%)
Do the systems take advantage of relational predicate for mixed workload queries? Query workload Mix: High text, low relational selectivity (0.01%)
Seconds per system and workload (synthetic TREC)
Mix
69 (97%)
97 (69%)
21 (75%)
![Page 19: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/19.jpg)
19
Top-k vs. All Results
Compare retrieving all vs. top-k results Query workload is Mix from before
High selectivity text expression (3%) Low selectivity relational predicate (0.01%)
Workload
SystemAll Top-k
A 69 2.6
B 97 96
C 28 2.2
Seconds per system and workload (450K tuples, synthetic TREC)
![Page 20: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/20.jpg)
20
Varying Sort Order
Compare sorting by score vs. sorting by relational attribute When retrieving all, results similar to previous Results for retrieving top-k shown below
Workload
SystemScore Relational
A 2.6 2.7
B 96 715
C 2.2 2.2
Seconds per system and workload (450K tuples, synthetic TREC)
![Page 21: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/21.jpg)
21
Varying the Input Collection
What is the effect of different input text collections on response time? Query workload: low text selectivity (0.03%)
All results retrieved
Text Data: synthetic TREC and VLC2
Collection
SystemSynthetic
TREC
Synthetic
VLC2
A 2.9 1.2
B 30 3.6
C 2.5 1.6Seconds per system and collection (450K tuples)
![Page 22: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/22.jpg)
22
Outline
Benchmark Components Evaluation Synthetic Text Generation
![Page 23: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/23.jpg)
23
Synthetic Text Generation
TextGen: Input: document collection D, scale-up factor m Output: document collection D’ with |D| x m
documents Problem: Given documents D, how do we
add documents to obtain D’ ? Goal: Same response times for workloads on D’
and corresponding real collection C, |C|=|D’| Approach: Extract “features” from D and
draw |D’| samples according to features
![Page 24: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/24.jpg)
24
Document Collection Features
Features considered W(w,c) : word distribution G(n, v) : vocabulary growth U,L : number of unique, total words per
document C(w1, w2, …, wn, c) : co-occurrence of
word groups Each feature is estimated by a model
Ex. Zipf[11] or empirical distribution for W Ex. Heaps Law for G[7]
7. H. S. Heaps, Information Retrieval, Computational and Theoretical Aspects. Academic Press, 1978.11. G. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Hafner Publications, 1949.
![Page 25: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/25.jpg)
25
Process to Generate D’
Pre-process: estimate features Depends on model used for feature
Generate |D’| documents Generate each document by sampling W
according to U and L Grow vocabulary according to G
Post-process: Swap words between documents in order to satisfy co-occurrence of word groups C
![Page 26: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/26.jpg)
26
Feature-Model Combinations
Considered 3 instances of TextGen, each a combination of features/models
Feature
TextGenW
(Word distr.)
G(Vocab)
L(Length)
U(Unique)
C(co-occur.)
Synthetic1 Zipf
Heaps Average
N/A
Synthetic2Empirical Average
N/A
Synthetic3 Empirical
![Page 27: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/27.jpg)
27
Which TextGen is a Good Generator?
Goal: response time measured on synthetic (S) and real (D) should be similar across systems
Does the use of randomized words in D’ affect response time accuracy?
How does the choice of features and models effect response time accuracy as the data set scales?
![Page 28: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/28.jpg)
28
Use of Random Words
Words are strings composed of a random permutation of letters
Random words are useful for: Vocabulary growth Sharing text collections
Do randomized words affect measured response times? What is the affect on stemming, compression, and
other text processing components?
![Page 29: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/29.jpg)
29
Effect of Randomized Words
Experiment: create two TEXTURE databases and compare across systems Database AP based on TREC AP Vol. 1 Database R-AP: randomize each word in AP Query workload: low & high selectivity keywords
Result: response times differ on average by < 1%, not exceeding 4.4%
Conclusion: using random words is reasonable for measuring response time
![Page 30: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/30.jpg)
30
Effect of Features and Models
Experiment: compare response times over same sized synthetic (S) and real (D) collections Sample s documents of D Use TextGen to produce S at several scale factors
|S| = 10, 25, 50, 75, and 100% of |D|
Compare response time across systems Must repeat for each type of text-only query
workload Used as framework for picking features/models
![Page 31: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/31.jpg)
31
TextGen Evaluation Results
How does response time measured on real data compare to the synthetic TextGen collections?
Query workload: low selectivity text only query (0.03%) Graph is for System A
Similar results obtained for other systems
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
10 25 50 75 100
Scale Factor (%)
Ela
psed
Tim
e (s
econ
ds)
Real Collection
Synthetic-1
Synthetic-2
Synthetic-3
![Page 32: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/32.jpg)
32
Future Work
How should quality measurements be incorporated?
Extend the workload to include updates
Allow correlations between attributes when generating database
![Page 33: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/33.jpg)
33
Conclusion
We propose TEXTURE to fill the gap seen by applications that use mixed relational and text queries
We can scale-up a text collection through synthetic text generation in such a way that response time is accurately reflected
Results of evaluation illustrate significant differences between current commercial relational systems
![Page 34: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/34.jpg)
34
References
1. http://es.csiro.au/TRECWeb/vlc2info.html2. http://trec.nist.gov3. http://www.tpc.org4. S. DeFazio, Full-text Document Retrieval Benchmark, chapter 8. Morgan
Kaufman, 2 edition, 19935. D. DeWitt. The Wisconsin Benchmark: Past, Present, and Future. The
Benchmark Handbook, 1991.6. J. Gray, P. Sundaresan, S. Englert, K. Baclawski, and P. J. Weinberger.
Quickly Generating Billion-record Synthetic Databases. ACM SIGMOD, 19947. H. S. Heaps, Information Retrieval, Computational and Theoretical Aspects.
Academic Press, 1978.8. P. O’Neil. The Set Query Benchmark. The Benchmark Handbook, 19919. K. A. Shoens, A. Tomasic, H. Garcia-Molina. Synthetic Workload Performance
Analysis of Incremental Updates. In Research and Development in Information Retrieval, 1994.
10. C. Turbyfill, C. Orji, and D. Bitton. AS3AP- a Comparative Relational Database Benchmark. IEEE Compcon, 1989.
11. G. Zipf. Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. Hafner Publications, 1949.
![Page 35: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS](https://reader035.vdocument.in/reader035/viewer/2022062803/568146ea550346895db42409/html5/thumbnails/35.jpg)
35
Questions?