24. april 1998 dutch cadastre 1 efficient storage and retrieval for large spatial data set in a...
TRANSCRIPT
24. April 1998
Dutch Cadastre 1
Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS
Andrew U. FrankDept. of Geoinformation Technical University [email protected]
24. April 1998
Dutch Cadastre 2
Overview
Why a DatabaseTwo Database Issues:
Modeling and ImplementationBase assumptions about spatio-temporal database
Implementation of spatial access ModelingInteroperabilityInteraction: Multi-Agency DatabasesOpen GIS
24. April 1998
Dutch Cadastre 3
Why a Database
A database achieves in an agency:IntegrationConsistencySharing(reduction of redundancy, but not of storage)
24. April 1998
Dutch Cadastre 4
Two Issues:
Modeling what are the things to represent and how do we
logically structure them
Implementation how is this solved with a computer
Usually only modeling is important.
24. April 1998
Dutch Cadastre 5
Implementation is crucial for DBMS
because performance is critical:GIS are too large to be stored completely in main
memory.Access to disk takes 10 millisec; access in main
memory is 100 nsec 1 : 10 8 - or like 3 sec to 1 year!We therefore must start with the performance for
the most often used GIS operation.
24. April 1998
Dutch Cadastre 6
Base Assumptions about Spatio-Temporal Database
Objects: independently existing, with some properties and
entering in some relations to other objects.
Spatial objects: have a location and a spatial extend (expressed in a
global coordinate system)
24. April 1998
Dutch Cadastre 7
Base Assumptions about Spatio-Temporal Database
Temporal objects: change their properties in time questions about past (or future states) can be asked(valid time)
Administrative database: questions about when a change became known can
be asked (transaction times)
24. April 1998
Dutch Cadastre 8
Most often used operationAccess to spatial data
Spatial data must be retrieved quickly based on location:
This is missing in a commercial DBMS.SQL can be used, but performance is insufficient
Spatial clustering absolutely required
24. April 1998
Dutch Cadastre 9
Databases for Spatial Data:
Classical architecture: a DBMS for the administrative data, a specialized file system for the other data
Research goal: integrated database for both
attribute and geometric datawith spatial access method.
The field tree is a methods for spatial accessdesigned for cadastral applications)
24. April 1998
Dutch Cadastre 10
Field tree explanations:
Regular grid - could cluster point objects with equal density
24. April 1998
Dutch Cadastre 11
Field tree explanations 2:Quadtree grid - could cluster point objects
with irregular distribution
24. April 1998
Dutch Cadastre 12
Field tree explanations 3:But it cannot cluster extended objects
24. April 1998
Dutch Cadastre 13
Field tree explanations 4:
The field tree can do it:
24. April 1998
Dutch Cadastre 14
Field tree explanations 5:
Add a next level, half as large and shifted:
24. April 1998
Dutch Cadastre 15
Field tree explanations 6:
Add another level (blue):
24. April 1998
Dutch Cadastre 16
Field tree explanations 7:
Add another level (green):
24. April 1998
Dutch Cadastre 17
Why Field Trees ?
Extended objects (represented by surrounding minimal box) are stored with a field.
Fields cover the area multiply.Guarantee: Every object is on a page which is at
most 4 times the size of the object.(This a quad tree based method cannot achieve)
Access times depend on the amount of data retrieved, not on the amount of data stored.
24. April 1998
Dutch Cadastre 18
Query in field treeDetermine which fields overlap the query
window;search all (but only these) for objects of interest
24. April 1998
Dutch Cadastre 19
Spatial access research
Concentration was the implementation of spatial access.
Results:Mostly complex, difficult to implement methods
24. April 1998
Dutch Cadastre 20
Conclusion from spatial access research:
Samet: "use a spatial clustering, use any"Problem was: unclear what criteria for
optimization (nearest neighbor, range query) and what the properties of the data
Identify problem before you optimize.Identification of detail of problem was not possible; lack of spatial statistics methods.
The field tree has been used for cadastral and similar problems in commercial environments previously and performed well.
24. April 1998
Dutch Cadastre 21
The Issue is Integration
The integration of spatial access with the other DBMS services (especially transaction management) is extremely difficult.
The commercial DBMS vendors are not willing nor capable of providing spatial access built into the core of the DBMS engine;
the difficulty is the integration with the transaction management system
24. April 1998
Dutch Cadastre 22
Practical solution:
Spatial access/spatial clustering must be builton top of standard DB functionality (e.g. commercial relational DB).
How to cluster if one cannot access the low level storage subsystem?
One must exploit the B-Tree data structure, which uses physical clustering in most commercial DBMS
24. April 1998
Dutch Cadastre 23
Concept
1. Assign to each object a single number based on a spatial encoding.
2. When storing the object, use this number to achieve physical clustering by spatial location in the DBMS.
3. When searching:determine all spatial codes which fall into the query window, search these codes using the DBMS’s built in B-Tree
24. April 1998
Dutch Cadastre 24
How to encode spatial location:Cluster by Morton numbers
Proposal by Abel (CSIRO)based on quad treeDisadvantage - small objects may end in very
large cell (or multiple keys necessary; multiple keys
cannot be used for clustering in most B-Trees)
24. April 1998
Dutch Cadastre 25
Field Code
Field-tree numbers to encode the spatial location and extend of an objects (v.Oosterom’s idea)--> a single number Spatial Location Code
Store spatial objects with this code (exploiting physical clustering)
Search:determine the fields which may contain the objects;search these
24. April 1998
Dutch Cadastre 26
Practical results
Search based on intervals of field codes; Heuristics to reduce the number of intervals
submitted for range queries.
Tests with real data:demonstrate speed up of search by factors from
10 to 100 times
24. April 1998
Dutch Cadastre 27
Modeling Issues
Assumption:Relational DBMS - today's standard for
implementation.Data model:
relations, consisting of tuples of attribute values; relational calculus, SQL as 'universal data speak' (not really useful as a
user query language)
This is a data (value) oriented concept
24. April 1998
Dutch Cadastre 28
Object-orientation necessary
OO concept necessary for spatial and temporal databases, especially cadastre:
Object have identity in time (parcel id as the classical example)
Objects have attribute valuesObjects enter in relations
24. April 1998
Dutch Cadastre 29
Object ID centered
Object-Oriented data models create a data model clash, similar to the clash between Relational DBMS and sequential processing in conventional languages.
24. April 1998
Dutch Cadastre 30
Object ID centered
The relation between objects and attribute values are functions from ID to attribute valuerelations are function from ID to ID (of the related object)
A concept of a representation of an object as a contiguous data space is not necessary, but may be useful for clustering using Spatial Location Codes.
This approach seems to solve most of the oo model problems discussed in the literature
24. April 1998
Dutch Cadastre 31
Future:
What we can realize now are:Spatio-temporal multi-user database for a single agency.
How to deal with cooperating agencies?(Your achievements demonstrates the need for this)
What are the next questions?
24. April 1998
Dutch Cadastre 32
The multi-agency database:
Data is sharedResponsibility for the data is clearly identifiedData is not centralized.
24. April 1998
Dutch Cadastre 33
The multi-agency database:
This is more than a distributed DB, because it requires a new transaction concept
The classical discussion of the 'long transaction', including distributed responsibility for data change within a transaction.
Concept: agencies send update proposals for data they cannot change themselves to the agency which is responsible.
24. April 1998
Dutch Cadastre 34
Interoperability
Agencies must cooperate.So far, we exchange data.
Updates are not propagated!Future: interoperability, independent of vendor
of the software(the so called Open GIS)
24. April 1998
Dutch Cadastre 35
Interoperability as a technical problem
Computer networkagreement on base cooperation (network
standard)
GIS cooperation:data model and related concepts
24. April 1998
Dutch Cadastre 36
Interoperability as a semantics problem
What does the data mean?How to describe the data?How to describe the meaning of data - in a
formal language to be used in a computer?
24. April 1998
Dutch Cadastre 37
Formal Language
Describing natural language with formal toolsnot likely achieved soon.
Sufficient for GIS: Definitions for restricted user communities
e.g., agencies within a town
24. April 1998
Dutch Cadastre 38
Open GIS Standards
Development of industry accepted standards in step with the rapid development of base technology
Cooperation of all GIS vendors:Goal:
Open Systems
24. April 1998
Dutch Cadastre 39
Open GIS
Interoperability independent of vendor storage of data under one systemanalysis tools from another system
24. April 1998
Dutch Cadastre 40
Open2 GIS
Interoperability independent of agencyNeeds cooperation of user communities.
Major users are already working in the Open GIS Consortiumto assure that their application concepts are standardized.
24. April 1998
Dutch Cadastre 41
GIS User Organization gainfrom Open GIS
standardized environments to solve application problems
accumulation of knowledge of the application domain
cooperation of agencies in Europe(and export of knowledge)
A cadastral special interest group is discussed in OGC
24. April 1998
Dutch Cadastre 42
GIPSIE Project
EU project (DG III: Information Technology)• to promote Open GIS within the GI industry and
user community in Europe• to bring European Issues into the OGC process• to contribute with research to the Open GIS
standardsParticipation by European companies and agencies
required.Contacts
Andrew Frank - TU ViennaWerner Kuhn - U Muenster