tractor pulling on data warehouse
DESCRIPTION
This topic was presented by Martin Kersten (CWI) at the 4th International Workshop on Testing Database Systems (DBTest 2011) on June 13th, 2011 in Athens, Greece.Publication: http://bit.ly/yK5JZkAbstract: Robustness of database systems under stress is hard to quantify, because there are many factors involved, most notably the user expectation to perform a job within certain bounds of the user requirements. Nevertheless, robustness of database system is very important to end users. In this paper we develop a database benchmark suite, inspired by tractor pulling, where robustness is measured as a system's ability to process data despite a continuous increase in system load, as defined in terms of data volume, query volume and complexity. A functional evaluation is performed against several systems to highlight the benchmark capabilities.TRANSCRIPT
Tractor Pulling on Datawarehouses
Martin Kersten, Volker MarklMeikel Poess, Kai-Uwe Settler
Alfons Kemper, Ani Nica,
DBTest 2011
The good old days• The early eighties when – Oracle appeared on the scene– Ingres was a respected innovator on
RDBMS– System R fought the Codasyl battle– IMS was still dominating the market
• There was a need for a metric to evaluate the solutions
The good old days• Turned into an organised battle– TPC-C, TPC-H, TPC-D, TPC-W… – hundreds of benchmarks to proof one’s
muscles
• We need tools to assess a solution space
• We don’t need weapons to win a ‘war’
Dagstuhl 2010 Robust Query Processing
• With each step in the pull the tension of the Tractor increases (exponentially)
• The Tractor driver is throttling and changing gears to keep it going
Ingredients of the DBMS Tractor Pull
• A tractor pull is a series of workload steps for which we measure the performance
• Each step is defined by – Catalog changes– Database load, delete+load+create
index– Query processing, BI grouped statistics– Concurrency– Act of God operations
A database soil
Generate a small database < RAMUse a single data type
A database soil
Cop
COPY the smaller relation into the larger one
A database soil
Query template
SELECT R0.B0, ...,Ri.Bi, count(*), avg(R0.B0),avg(R1.B0), avg(R1.B1),. . ., avg(Ri.B0), . . .FROM R0, . . . , RiWHERE selectpattern(R0, . . . , Ri) AND joinpattern(R0, . . . , Ri)GROUP BY R0.B0, . . . , Ri.BiORDER BY R0.B0, . . . , Ri.Bi
Linear, Cyclic, Star-based, Clique query patterns
The n-th query load includes the n-1 th query load
Scenarios• Tractor pull workload
• W(N) = < S, L, Pre, Qry, Post, qry, db>– Schema adjustments– Loading the database – Pre-optimization– Query execution– Post optimization– query characteristics– db growth function
Hill scenario• The Hills scenario models a data
warehouse that grows with a modest growth rate of g ∈ (0, 1) (e.g., g = 0.2).
• It starts out from a main-memory focus until it overflows into a few disks.
• It will highlight a system’s robustness to deal with the memory-disk performance chasm.
Hill scenarioA modest growing warehouse with a
single user.The database fits in memory and spills
over to disk
D ∈ (0%, 100%), G∈ (0, 1)Number of connections at track I : 1db(0) = (D x RAM) x ( 1 / (2 x dom) )db(i) = g x i x db(0)qry(0) = 1, qry(i) = 4|qry(i)| = 1 + 4 x i
A stable warehouse with a multiple users.Query templates stress complexity
d∈(0%,100%), g=0, C>1Number of connections at track i : Cdb(0) = (d × RAM) × (1) 2×domdb(i) = 0 (no growth)qry(0) = 0, qry(i) = C |Q(i)| = 1 + C × i
Meadow scenario
A growing warehouse with a multiple users.
Query templates stress complexity
d∈(0%,100%), g∈ (0,10)Number of connections at track i : idb(0) = (d × RAM) × (1) 2×domdb(i) = g × i × db(0)qry(0) = 0, qry(i) = i × 4|Q(i)| = 1 + 4 × i (i+1)/2
Rockies scenario
Robustness metrics• It is a multi-dimensional metric
aimed at measuring the deviation from the expected norm
• Robust(N)=<L, S, QO, QOk, QE, QEk, H>– Standard deviation of the loading time L– ,, Storage requirements– ,, Query optimization (per track– ,, Query execution (per track)– ,, Holistic
A hill scenario
A meadow Scenario
A Rockies scenario
Take aways
• Robustness is all about comparisons. We need methods to quickly determine difference in behavior.
• If the system reaches the end of the field we are happy. If it blows up or if the queries are behaving worse along the way it is not robust.
Conclusions• Tractorpulling is an effective new
toolkit for robustness testing a DBMS in various dimensions
• Refinements for ease of analysis is needed (GUIs)
• http://sourceforge.net/projects/tractorpulling