24. april 1998 dutch cadastre 1 efficient storage and retrieval for large spatial data set in a...

42
24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion Technical University Vienna [email protected]

Upload: jean-boyd

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 1

Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS

Andrew U. FrankDept. of Geoinformation Technical University [email protected]

Page 2: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 2

Overview

Why a DatabaseTwo Database Issues:

Modeling and ImplementationBase assumptions about spatio-temporal database

Implementation of spatial access ModelingInteroperabilityInteraction: Multi-Agency DatabasesOpen GIS

Page 3: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 3

Why a Database

A database achieves in an agency:IntegrationConsistencySharing(reduction of redundancy, but not of storage)

Page 4: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 4

Two Issues:

Modeling what are the things to represent and how do we

logically structure them

Implementation how is this solved with a computer

Usually only modeling is important.

Page 5: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 5

Implementation is crucial for DBMS

because performance is critical:GIS are too large to be stored completely in main

memory.Access to disk takes 10 millisec; access in main

memory is 100 nsec 1 : 10 8 - or like 3 sec to 1 year!We therefore must start with the performance for

the most often used GIS operation.

Page 6: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 6

Base Assumptions about Spatio-Temporal Database

Objects: independently existing, with some properties and

entering in some relations to other objects.

Spatial objects: have a location and a spatial extend (expressed in a

global coordinate system)

Page 7: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 7

Base Assumptions about Spatio-Temporal Database

Temporal objects: change their properties in time questions about past (or future states) can be asked(valid time)

Administrative database: questions about when a change became known can

be asked (transaction times)

Page 8: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 8

Most often used operationAccess to spatial data

Spatial data must be retrieved quickly based on location:

This is missing in a commercial DBMS.SQL can be used, but performance is insufficient

Spatial clustering absolutely required

Page 9: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 9

Databases for Spatial Data:

Classical architecture: a DBMS for the administrative data, a specialized file system for the other data

Research goal: integrated database for both

attribute and geometric datawith spatial access method.

The field tree is a methods for spatial accessdesigned for cadastral applications)

Page 10: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 10

Field tree explanations:

Regular grid - could cluster point objects with equal density

Page 11: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 11

Field tree explanations 2:Quadtree grid - could cluster point objects

with irregular distribution

Page 12: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 12

Field tree explanations 3:But it cannot cluster extended objects

Page 13: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 13

Field tree explanations 4:

The field tree can do it:

Page 14: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 14

Field tree explanations 5:

Add a next level, half as large and shifted:

Page 15: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 15

Field tree explanations 6:

Add another level (blue):

Page 16: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 16

Field tree explanations 7:

Add another level (green):

Page 17: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 17

Why Field Trees ?

Extended objects (represented by surrounding minimal box) are stored with a field.

Fields cover the area multiply.Guarantee: Every object is on a page which is at

most 4 times the size of the object.(This a quad tree based method cannot achieve)

Access times depend on the amount of data retrieved, not on the amount of data stored.

Page 18: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 18

Query in field treeDetermine which fields overlap the query

window;search all (but only these) for objects of interest

Page 19: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 19

Spatial access research

Concentration was the implementation of spatial access.

Results:Mostly complex, difficult to implement methods

Page 20: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 20

Conclusion from spatial access research:

Samet: "use a spatial clustering, use any"Problem was: unclear what criteria for

optimization (nearest neighbor, range query) and what the properties of the data

Identify problem before you optimize.Identification of detail of problem was not possible; lack of spatial statistics methods.

The field tree has been used for cadastral and similar problems in commercial environments previously and performed well.

Page 21: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 21

The Issue is Integration

The integration of spatial access with the other DBMS services (especially transaction management) is extremely difficult.

The commercial DBMS vendors are not willing nor capable of providing spatial access built into the core of the DBMS engine;

the difficulty is the integration with the transaction management system

Page 22: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 22

Practical solution:

Spatial access/spatial clustering must be builton top of standard DB functionality (e.g. commercial relational DB).

How to cluster if one cannot access the low level storage subsystem?

One must exploit the B-Tree data structure, which uses physical clustering in most commercial DBMS

Page 23: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 23

Concept

1. Assign to each object a single number based on a spatial encoding.

2. When storing the object, use this number to achieve physical clustering by spatial location in the DBMS.

3. When searching:determine all spatial codes which fall into the query window, search these codes using the DBMS’s built in B-Tree

Page 24: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 24

How to encode spatial location:Cluster by Morton numbers

Proposal by Abel (CSIRO)based on quad treeDisadvantage - small objects may end in very

large cell (or multiple keys necessary; multiple keys

cannot be used for clustering in most B-Trees)

Page 25: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 25

Field Code

Field-tree numbers to encode the spatial location and extend of an objects (v.Oosterom’s idea)--> a single number Spatial Location Code

Store spatial objects with this code (exploiting physical clustering)

Search:determine the fields which may contain the objects;search these

Page 26: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 26

Practical results

Search based on intervals of field codes; Heuristics to reduce the number of intervals

submitted for range queries.

Tests with real data:demonstrate speed up of search by factors from

10 to 100 times

Page 27: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 27

Modeling Issues

Assumption:Relational DBMS - today's standard for

implementation.Data model:

relations, consisting of tuples of attribute values; relational calculus, SQL as 'universal data speak' (not really useful as a

user query language)

This is a data (value) oriented concept

Page 28: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 28

Object-orientation necessary

OO concept necessary for spatial and temporal databases, especially cadastre:

Object have identity in time (parcel id as the classical example)

Objects have attribute valuesObjects enter in relations

Page 29: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 29

Object ID centered

Object-Oriented data models create a data model clash, similar to the clash between Relational DBMS and sequential processing in conventional languages.

Page 30: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 30

Object ID centered

The relation between objects and attribute values are functions from ID to attribute valuerelations are function from ID to ID (of the related object)

A concept of a representation of an object as a contiguous data space is not necessary, but may be useful for clustering using Spatial Location Codes.

This approach seems to solve most of the oo model problems discussed in the literature

Page 31: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 31

Future:

What we can realize now are:Spatio-temporal multi-user database for a single agency.

How to deal with cooperating agencies?(Your achievements demonstrates the need for this)

What are the next questions?

Page 32: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 32

The multi-agency database:

Data is sharedResponsibility for the data is clearly identifiedData is not centralized.

Page 33: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 33

The multi-agency database:

This is more than a distributed DB, because it requires a new transaction concept

The classical discussion of the 'long transaction', including distributed responsibility for data change within a transaction.

Concept: agencies send update proposals for data they cannot change themselves to the agency which is responsible.

Page 34: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 34

Interoperability

Agencies must cooperate.So far, we exchange data.

Updates are not propagated!Future: interoperability, independent of vendor

of the software(the so called Open GIS)

Page 35: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 35

Interoperability as a technical problem

Computer networkagreement on base cooperation (network

standard)

GIS cooperation:data model and related concepts

Page 36: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 36

Interoperability as a semantics problem

What does the data mean?How to describe the data?How to describe the meaning of data - in a

formal language to be used in a computer?

Page 37: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 37

Formal Language

Describing natural language with formal toolsnot likely achieved soon.

Sufficient for GIS: Definitions for restricted user communities

e.g., agencies within a town

Page 38: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 38

Open GIS Standards

Development of industry accepted standards in step with the rapid development of base technology

Cooperation of all GIS vendors:Goal:

Open Systems

Page 39: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 39

Open GIS

Interoperability independent of vendor storage of data under one systemanalysis tools from another system

Page 40: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 40

Open2 GIS

Interoperability independent of agencyNeeds cooperation of user communities.

Major users are already working in the Open GIS Consortiumto assure that their application concepts are standardized.

Page 41: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 41

GIS User Organization gainfrom Open GIS

standardized environments to solve application problems

accumulation of knowledge of the application domain

cooperation of agencies in Europe(and export of knowledge)

A cadastral special interest group is discussed in OGC

Page 42: 24. April 1998 Dutch Cadastre 1 Efficient Storage And Retrieval for Large Spatial Data Set in a Relational DBMS Andrew U. Frank Dept. of Geoinformat ion

24. April 1998

Dutch Cadastre 42

GIPSIE Project

EU project (DG III: Information Technology)• to promote Open GIS within the GI industry and

user community in Europe• to bring European Issues into the OGC process• to contribute with research to the Open GIS

standardsParticipation by European companies and agencies

required.Contacts

Andrew Frank - TU ViennaWerner Kuhn - U Muenster