database shootout: benchmarking spatial dbmss
DESCRIPTION
Database Shootout: Benchmarking spatial DBMSs. Wim de Haas Wilko Quak. You’re in the eye of the storm!. Early morning : Brock Anderson about WMS/PostGIS/Shapefile performance This afternoon : Kevin Neufeld about tips for the PostGIS power user - PowerPoint PPT PresentationTRANSCRIPT
26SEP2007
FOSS4G2007
Database Shootout: Benchmarking spatial DBMSs
Wim de HaasWilko Quak
26SEP2007
FOSS4G2007 2
Delft University of Technology
You’re in the eye of the storm!
Early morning: Brock Anderson about WMS/PostGIS/Shapefile performance
This afternoon: Kevin Neufeld about tips for the PostGIS power user
Now: Reflect on the factor 10 and the framework for testing
26SEP2007
FOSS4G2007 3
Delft University of Technology
Overview
• Introduction• What are the problems?• A classification of Spatial DBMS users• How can we help them• Benchmark proposal• First test results• Next steps
26SEP2007
FOSS4G2007 4
Delft University of Technology
Introducing the Ministry of Transport, Public Works and Water Management
Our core tasks are:• to offer protection against floods • to guarantee safe and reliable
connections over land, water and through the air
• to ensure clean and sufficient water
• Rijkswaterstaat (RWS) is the executive branche of the Ministry of Transport
26SEP2007
FOSS4G2007 5
Delft University of Technology
Business drivers
• How to keep track of all these assets?• How to ensure consistency & coherence
in operations and change of Rijkswaterstaat?
• How to facilitate decisionmaking and communication
• Enter the Digitaal Topografisch Bestand (DTB)
– 3D– 1:1000– EUR 60M
26SEP2007
FOSS4G2007 6
Delft University of Technology
DTB waterway and highway
26SEP2007
FOSS4G2007 7
Delft University of Technology
DTB Birds eye view
26SEP2007
FOSS4G2007 8
Delft University of Technology
DTB Amsterdam Airport
26SEP2007
FOSS4G2007 9
Delft University of Technology
Enter IVRI
• The new system for data acquisition and maintenance for the DTB
• Oracle 10g• ArcGIS 9.2• Summit Evolution• Very complex project
26SEP2007
FOSS4G2007 10
Delft University of Technology
First comment on Murphy’s Law
• Murphy was an optimist• Oracle and ESRI were pushed to the
limits• Took extra time in the project• Triggered us to be less dependent on
Oracle• Oracle Spatial is not cheap, so can we
use PostGIS as the main datastore for Spatial data?
26SEP2007
FOSS4G2007 11
Delft University of Technology
Why bother …
Stonebraker2007:
• Where to find dramatic differences in Spatial DBMSs?
We define “dramatically outperform” to mean at least a factor 10 advantage […then] customers will be inclined to try the new architecture
26SEP2007
FOSS4G2007 12
Delft University of Technology
Where to expect Dramatic differences?
• Operating System (No)• MySQL Spatial Extension vs PostGIS
(Yes)• Choice of FileSystem (Maybe)• Functionality Difference (Yes)• Choice of Parameters (Maybe)
26SEP2007
FOSS4G2007 13
Delft University of Technology
Problems with testing
• DBMS vendors do not want published results
– Oracle explicitly forbids publishing benchmark results
• Hardware– Moore’s Law– I/O
• Release Frequency of Software• Spatial testing cannot be done on
synthetic data• Too many parametersBenchmark results are outdated
before they are publised
26SEP2007
FOSS4G2007 14
Delft University of Technology
Benchmark consideration: Weird Cases department
diagonalquery
geometry
flatquery
geometry
26SEP2007
FOSS4G2007 15
Delft University of Technology
Benchmark consideration: Hot vs Cold
26SEP2007
FOSS4G2007 16
Delft University of Technology
Solution
• Do not publish the result of the benchmark
• Publish a framework that lets people do their own benchmarking
• No “One size fits all”: Buyer’s guide• Help different users to find best DBMS
26SEP2007
FOSS4G2007 17
Delft University of Technology
Classification of spatial DBMS users
Four classes:1. Server Builders: publish spatial data via
web services2. GIS User: Load various datasets and
perform complex analyses3. Data Maintainer: Maintain one core
dataset4. Power Users: All of the above and more
26SEP2007
FOSS4G2007 18
Delft University of Technology
Class 1: Web Server Builders
• You do not really need a DBMS for this (You use a fraction of DBMS functionality)
– This maybe oversimplified, but is used here for the purpose of clarity
• Only one query counts: Find everything within BBOX
26SEP2007
FOSS4G2007 19
Delft University of Technology
Class 2: GIS users
• Main interest is functionality• Spend more time on loading data• Need a good query optimiser• Analysis
26SEP2007
FOSS4G2007 20
Delft University of Technology
Class 3: Dataset Maintainers
• Limited number of queries• Transactions are an issue• Clustering of data after updates is
interesting• More time to tweak
– And after all, there are a lot of buttons to push
26SEP2007
FOSS4G2007 21
Delft University of Technology
Class 4: Power users
• Do their own testing• Need a platform to discuss their
findings
26SEP2007
FOSS4G2007 22
Delft University of Technology
Benchmark components
1. Functionality test• Literature review• Factual testing
2. Very simple performance test script with few parameters• BBOX Query• Fixed Dataset (Propasal OpenStreetMap dataset)
3. Configurable test suite• Full Suite that tests every corner of DBMS• For specialists only
26SEP2007
FOSS4G2007 23
Delft University of Technology
Configuration
• HW– Compaq DL380
• OS– Linux RH
• SW– MySQL 5.0– PostgreSQL 8.2.4– PostGIS 1.3.1
• Dataset is National Road Map
26SEP2007
FOSS4G2007 24
Delft University of Technology
Test 1 – Functionality:MySQL vs PostGIS
MySQL• Almost all operations
in MySQL return the same result as the corresponding MBR-based functions
– However, MySQL is making an effort complying to full OGC support
PostGIS• Full OGC support
Functionality of MySQL is only suited for simple WMS support and no spatial operations are done on
geometry
26SEP2007
FOSS4G2007 25
Delft University of Technology
Test 2: simple BBOX select
Write simple script that generates a lot of rectangle queries.
Paremeter:• DBMS size• query box size• experiment length
26SEP2007
FOSS4G2007 26
Delft University of Technology
Test 2: grow DBMS size
• Question: Does query response time depend on DBMS size or on core memory?
• Experiment: Run same test on more then one copies of same database
26SEP2007
FOSS4G2007 27
Delft University of Technology
Test 2 – result: PostGIS vs MySQL
0
0.01
0.02
0.03
0.04
0.05
0.06
0 500000 1000000 1500000 2000000 2500000 3000000 3500000
PostGIS
MySQL
26SEP2007
FOSS4G2007 28
Delft University of Technology
Test 2 – result: Conclusions
• As long as dataset fits in core memory differences are small
• MySQL can do more with less memory• MySQL degrades faster if you run out of
memory• Out of the box installation is bad PR for
PostGIS– Maybe because MySQL leaves caching of disk-
blocks to OS, while PostgreSQL is doing it otherwise
26SEP2007
FOSS4G2007 29
Delft University of Technology
Test 3: Comprehensive Test Suite
• Create set of killer polygons so that every line of source code will be touched by running operations
• Test Query optimizer• Test Join Operator
– Must be done with Skewed Data
26SEP2007
FOSS4G2007 30
Delft University of Technology
Conclusions (overall)
• This is a work in progress– We still miss polygons and spatial queries
• The factor 10 is not within reach, yet– No dramatic differences
26SEP2007
FOSS4G2007 31
Delft University of Technology
How to proceed
• Finish the work and publish this– Timeline OCT-NOV2007
• TU Delft wiki or osgeo.org wiki?• Start a Special Interest Group a.k.a.
Committee?