gis v3 datamodel

Upload: leonardo-olarte

Post on 03-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Gis v3 Datamodel

    1/19

    Geoscience Information SystemsModulCode GeoData BA Nr. 041

    c2012 Helmut Schaeben

    Geomathematics and GeoinformaticsTU Bergakademie Freiberg, Germany

    TUBAF Winter Term 2012/13

    Geoscience Information Systems

    Data modeling

  • 7/28/2019 Gis v3 Datamodel

    2/19

    Data Modeling

    Contents

    1 Spatial Objects Geoobjects

    2

    Raster and Vector Spatial Data Model3 Attribute Data

    4 The Relational Data Model

    5 Raster Data Structures

    6 Vector Data Structures

    Data Modeling

    Data modelingthe process of defining and organizing data into a consistent digitaldataset that is useful and reveals information.

    Data modelThe logical organization of data according to a scheme.

    Data structureTo represent the data model.

    Spatial Objects (1)

    The abstraction process of representing realworld phenomena in acomputeraccessible form involves the use of symbolic models, i.e.simplified representations, e.g.

    digital elevation model (DEM)the cells of the grid are the spatial objects, whose values are

    symbolized by numbers in the data file.Unless the organizational scheme is known, the data areworthless.

    digital data set containing well data from drill holes in acoalfieldthe spatial objects are the sample locations in the wells atwhich formation tops, and other attributes, are recorded.

    Digital Elevation Model (DEM)

    A digital elevation model based on sufficiently many measurementsof the height of the surface allows to numerically determine theheight at any given point of a well defined region and its digitalrepresentation as a geoobject.

    data (x, y, h)

    metadata like physical units of measurements, coordinate

    system, date of the recording, reliability of data, weather(visibility), name of the author ...

    experiences, knowledge, modeling assumptions, ...

    methods of spatial interpolation, approximation, prediction, ...(maths, numerics)

    digital representation of the DEM (informatics, data model,data structure)

    visualization (computer graphics)

  • 7/28/2019 Gis v3 Datamodel

    3/19

    Spatial Objects (2)

    Digital elevation model (DEM)

    data organized according to a raster (grid) model,

    the raster is organized in a runlength encoded data structure,and

    the data is written on a digital storage device in a selected fileformat;

    Spatial Objects (3)

    Digital elevation model (DEM)

    data organized according to a vector model, expressed aspolygons bounded by contour lines,

    arranged in some topological structure, and

    written on a storage device in a digital line graph format;

    Spatial Objects (4)

    Digital elevation model (DEM)

    data locations are organized according to a graph, e.g.

    triangulated according to some criterion (e.g. Delaunay),the triangulation is represented as a table containing thevertices of each triangle and its neighboring triangles sharing acommon edge, and

    written on a digital storage device in a selected file format.

    Spatial Objects (5)

    Spatial objects can be classified according to

    being continuous or discontinuous,temperature, gravity;state of matter, rock bodies.being natural or imposed spatial object,discrete spatial entities, e.g. river, ore body;artificial entities, e.g. pixels (picture elements).

    their dimension, Euclidean, fractal, etc.,being regularly or irregularly shaped,being samplinglimited or definitionlimited,information about shape and extent is limited only by theamount of available sampling, e.g. an oilpool bounded bywater from below and by caprock from above;metallic orebody defined by cutoff grade, seismic epicentresare defined by the sensitivity of the seismometer, elevationcontour line is defined by a given elevation, a geochemicalanomaly is defined by the threshold.

  • 7/28/2019 Gis v3 Datamodel

    4/19

    Raster and Vector Model (1)

    The vector model uses irregular spatial objects that can be eithernatural or imposed and it employs a boundary representation ofthese area objects.

    The raster model uses regular imposed spatial objects, i.e. pixels,voxels, etc., which do not require individual boundary definitions.

    In either model, a spatial object is assumed to have properties thatare homogeneous.

    Raster and Vector Model (2)

    Raster and Vector Model (3) Raster and Vector Model (4)

    What are the required operations to determine the area and theperimter of a geoobject with respect to the raster or the vectormodel?

    arearaster: counting ...vector: adding ...

    perimeterraster: counting ...vector: adding ...

    What is more expensive in terms of cpu time?

  • 7/28/2019 Gis v3 Datamodel

    5/19

    Raster and Vector Model (5)

    Area of a polygon P given by the ordered set of points

    Pi = (xi, yi), i = 1, . . . , n + 1 with P1 = Pn+1. Introducing anarbitrary additional point P = (x, y) P, then

    A =n

    i=1

    Ai =1

    2

    ni=1

    | det

    x xi xi+1y yi yi+1

    1 1 1

    |

    =12

    ni=1

    |(x xi)(yi yi+1) + (xi xi+1)(yi y)|

    =1

    2

    ni=1

    |x(yi yi+1) + xiyi+1 xi+1yi + (xi+1 xi)y|

    =1

    2

    ni=1

    |xiyi+1 xi+1yi|

    Raster Model (1)

    Each raster cell (pixel) is associated with a number quantifying theobserved attribute,each layer of grid cells records a separate attribute.

    A raster can be represented as matrix A = (aij)i=1,...,m;j=1,...,n,its cells are addressed by row and column number ( i,j),and can be stored with addressing by sequence (s)=1,...,mn in thefile

    aij s(i1)n+j ; s a[ n]+1, [

    n]n

    where [q] denotes the largest natural number smaller than q.

    Spatial coordinates are not explicitly stored for each cell, becausethe storage order does this implicitly.

    The raster model represents a spatial object by enumeration.

    Processing raster data is efficient for e.g. overlaying images,neighborhood queries, spatial filtering, morphological operations,gradients, etc.

    Raster devices producing/displaying digital raster images arescanners, video-digitizers, video display monitors, line printers,inkjet plotters.Raster Model (2)

    The spatial resolution of a raster image is the size of ageoobject in the real world represented by an individual pixel.

    At 100m resolution,a square area of 100 km on a side requires a raster with 1000 rowsand 1000 columns or 1 000 000 (1 Mill) pixels;

    at 10m resolution,it requires 10000 by 10000 or 100 000 000 (100 Mill) pixels.

    If 1 byte (requiring 8 bits of computer storage, integer numbers 0to 255) is used per pixel, the storage needed for the latter rasterimage is 100 MB.

    Vector Model (1)

    In vector mode, vertices are ordered pairs of spatial coordinates,lines surrounding polygonal areas are made by linking sequences ofvertices, and areas are defined by lines that form closed loops orpolygons.

    The vector model represents a spatial object by its boundaries, anduses a labelling scheme to keep track of their attributes.

    The straightforward storing of strings of coordinate pairs is referredto as spaghetti model. The spatial objects can be regarded asgraphical elements. The boundary between two adjacent polygonsis stored twice, once for each polygon.

    Vector devices producing/displaying digital vector images aredigitizers that use line following principles, manual digitizing,digital pen plotters.

  • 7/28/2019 Gis v3 Datamodel

    6/19

    Vector Model (2) Vector Model (3)

    Reasonable data structures to store vector data are considerablymore complex.

    The structuring of vector data according to topological criteria isreferred to as topological model. The boundaries of polygons arebroken down into a series of arcs and nodes, and the spatialrelationship between arcs, nodes and polygons are explicitly defined

    in attributes tables.

    Planar enforcement results in a set of polygon objects that fill theplane of the map.

    The vector model requires topological attributes to facilitateoperations related to adjacency, containment, etc.

    Then, processing vector data is efficient, e.g. find all arcs whichhave granite on one side.

    Vector Model (4)

    Planar enforcement

    Graph Models (1)

    Thiessen Voronoi Dirichlet tesselation

    Delaunay triangulation

  • 7/28/2019 Gis v3 Datamodel

    7/19

    Graph Models (2) Attribute Data (1)

    Attributes of objects to be recorded in a database can be spatial,temporal, and thematic.

    Spatial attributes data about location, topology, and geometryof spatial objects

    Temporal attributes age of objects (geological age), time ofdata collection or measurement

    Thematic attributes rock type, annual rainfall, presence ofminerals or fossil taxa.

    Attributes of spatial objects are usually organized into lists ortables.

    Attribute Data (2)

    Attribute tables form a unifying link between raster and vectormodels, e.g.

    a soil map may be given in both vector and raster model, and bothmodels utilize the same polygon attribute table as the attributevalue in the raster is the pointer to the polygon label.

    Attribute Data (3)

  • 7/28/2019 Gis v3 Datamodel

    8/19

    The Relational Model (1)

    A relation is a twodimensional structure that contains data. It isan abstract concept that corresponds in practice to a table. It is a

    major aspect of data modeling that pertains to general DBMS.The Relational Model is an informaticians favourite model!

    The Relational Model (2)

    Definition of technical terms

    tuple a row of a relation (data record in a flat file, statistics:sample)

    field a column of a relation (referring to an attribute, property ina flat file)

    key, keyfield an attribute uniquley identifying tuples, providinglinks between one relation and another

    The Relational Model (3)

    Definition

    1 All data must be represented in tabular form (as opposed tohierarchies or graphs).

    2 All data must be atomic, i.e. any cell in the table cancontain only a single value.

    3 No duplicate tuples are allowed.

    4 Tuples can be rearranged without changing the meaning ofthe table.

    The Relational Model (4)

    Normalization of a relation is the process of converting a complexrelation into a larger number of simpler relations that refer to eachother satisfying relational rules.

    First, second, third, fourth, fifth normal form ...

    ... aiming at the reduction/removal of redundancy in a relation.

  • 7/28/2019 Gis v3 Datamodel

    9/19

    The Relational Model (5)

    First normal form: Relations without repeating groups ofattributes.

    Second normal form: Each nonidentifying attribute is functionallydependent on the whole key.

    Third normal form: Nonidentifying attributes are mutuallyindependent.

    Fourth normal form: ...

    Fifth normal form: ...

    The Relational Model (6)

    The Relational Model (7)

    Notice

    formation numbers are repeated,

    formation uniquely determines lithology and age.

    First normal form: Relations without repeating groups ofattributes

    New FORMATION relation and simplified POLYGON relationlinked by formation number to eliminate repeating attributesPOLYGON(poly, Fm)FORMATION(Fm, Fm name, lith, lithology, age,age)

    Rectifying for repeating groups by simplifying the FORMATIONrelation and creating an AGE relationPOLYGON(poly, Fm)FORMATION(Fm, Fm name, lith, lithology, age)AGE(age, age)

    The Relational Model (8)

    Third normal form: Nonidentifying attributes are mutuallyindependent

    Rectifying for dependencies

    POLYGON(poly, Fm)FORMATION(Fm, Fm name, lith, age)LITHOLOGY(lithology, lithology)AGE(age, age)

  • 7/28/2019 Gis v3 Datamodel

    10/19

    The Relational Model (9) The Relational Model (10)

    Spatial Data Structures

    Spatial data structures refer to the organization of spatial data in aform suitable for digital computers.

    According to the raster and vector model, there are raster

    structures and vector structures. While the model is unique, thestructure is not.

    There are also several structures to represent graph models.

    Spatial Raster Structures (1)

    Full raster structure restricts each layer to a single attribute, andlimits the values to integers in the range of 0 to 255. Given

    information about the size of the array, and the ordering convention(scan order), arrays can be stored as onedimensional lists.

  • 7/28/2019 Gis v3 Datamodel

    11/19

    Spatial Raster Structures (2) Spatial Raster Structures (3)

    Attributes are often called bands in digital imagery refering tothe bandwidths of the electromagnetic spectrum registered bysatellite imagers like LANDSAT.

    Band sequential (BSQ):pixel by pixel, row by row, layer by layer

    Band interleaved by line (BIL):pixels of a row of one layer by pixels of the row of next layer

    Band interleaved by pixel (BIP):pixel of one layer by pixel of next layer

    In this way, the band values for each pixel are stored physicallyclose together on the medium.

    Runlength Encoding (1)

    Adjacent pixels having the same value are combined together as arun, represented by a pair of numbers (runlength, pixelvalue).

    Thus, each run pair consists of a number for the length of the runin pixels, and a second number for the attribute or class value ofthe run.

    Each row starts with a new run.

    Runlength Encoding (2)

    The number of bits required for runlength depends on the numberof columns in the image.

    An image with 1024 columns requires ... bits to encode the runlength,an image with 4098 columns requires ... bits to encode the runlength.

  • 7/28/2019 Gis v3 Datamodel

    12/19

    Runlength Encoding (2)

    The number of bits required for run length depends on the numberof columns in the image.

    An image with 1024 columns requires 10 bits to encode the run

    length,an image with 4098 columns requires 12 bits to encode the runlength.

    Scan Orders for Rasters (1)

    row order

    prime row order

    Morton order

    PeanoHilbert order

    Scan Orders for Rasters (2)

  • 7/28/2019 Gis v3 Datamodel

    13/19

    Quadtrees, Octrees (1)

    Quadtrees and octrees are hierarchical data structues based onsuccessive subdivision of blocks into 4 quadrants, or 8 octants, forpixels or voxels, respectively.

    A quadrant (block) is not further subdivided,

    if it is either homogeneous, i.e. if all its pixels have the samevalue, or

    if it is the size of a pixel.

    It is usually represented by a tree (graph) structure.

    Quadtrees, Octrees (2)

    710 =

    row 110 0 1column 310 1 1

    = 01112 = 012

    1

    1123

    = 134

    Quadtrees, Octrees (3) Quadtrees, Octrees (4)

  • 7/28/2019 Gis v3 Datamodel

    14/19

    Quadtrees, Octrees (5) Quadtrees, Octrees (6)

    Quadtrees, Octrees (7) Quadtrees, Octrees (8)

    Take a look at:Tiles a la Google Maps: Coordinates, Tile Bounds and Projection

    http://www.maptiler.org/google-maps-coordinates-tile-bounds-projection/

  • 7/28/2019 Gis v3 Datamodel

    15/19

    Vector Structures (1)

    Spaghetti Structure

    In the spaghetti structure, tables of locational coordinates areassociated with each of the basic objects points, lines, polygons.

    No topological attributes are used.

    Relationships between spatial objects are not considered, they haveto b e computed from the spatial coordinates.

    Vector Structure (2)

    Vector Structure (3) Topological Vector Structure (1)

    Topological Structure

    Definition of technical terms

    points isolated points, vertices linked to form a line

    lines sequence of ordered vertices with a start node and anend node

    chain (arc, edge) line which is part of one or more polygons

    node point where lines or chains meet or terminate

    ring consists of one or more chains

    polygon consists of one outer ring and zero or more inner rings

    simple p olygon no inner ring

    complex polygon one or more inner rings

  • 7/28/2019 Gis v3 Datamodel

    16/19

    Topological Vector Structure (2)

    Basic topological structure in terms of a normalized relation by vanRoessel (1987).

    Topological Vector Structure (3)

    Topological Vector Structure (4) Spaghetti vs. Topological Structure by Example (1)

    Find all granite contacts that are also limestone contacts

    Remove all boundary lines between adjacent polygons that

    have the same classificationFind points on a structure map where fault traces intersect

  • 7/28/2019 Gis v3 Datamodel

    17/19

    Spaghetti vs. Topological Structure by Example (2)

    Find all granite contacts that are also limestone contacts

    Spaghetti vs. Topological Structure by Example (3)

    Find all granite contacts that are also limestone contacts

    Start with list of granite polygons, another list of limestone

    polygons. Then match the vertices of each granite polygonwith the vertices of every limestone polygon.

    Spaghetti vs. Topological Structure by Example (4)

    Find all granite contacts that are also limestone contacts

    Start with list of granite polygons, another list of limestonepolygons. Then match the vertices of each granite polygon

    with the vertices of every limestone polygon.Search of the chain topology table for (left, right) polygonpairs that are either (granite, limestone) or (limestone,granite).

    Spaghetti vs. Topological Structure by Example (5)

    Remove all boundary lines between adjacent polygons that

    have the same classification

  • 7/28/2019 Gis v3 Datamodel

    18/19

    Spaghetti vs. Topological Structure by Example (6)

    Remove all boundary lines between adjacent polygons thathave the same classification

    Polygons belonging to the same class need to be matchedwith one another to find common boundaries.

    Spaghetti vs. Topological Structure by Example (7)

    Remove all boundary lines between adjacent polygons thathave the same classification

    Polygons belonging to the same class need to be matched

    with one another to find common boundaries.

    Look in the chain topology table for (left, right) polygon pairswhere left and right have the same class.

    Spaghetti vs. Topological Structure by Example (8)

    Find points on a structure map where fault traces intersect

    Spaghetti vs. Topological Structure by Example (9)

    Find points on a structure map where fault traces intersect

    Each fault must be matched with every other fault, butpairwise comparison of vertices is not enough, because faultscould intersect anywhere, not just at vertices. Each adjacentvertex pair from one fault must be compared with everyadjacent pair of another fault to check if the lines intersect.

  • 7/28/2019 Gis v3 Datamodel

    19/19

    Spaghetti vs. Topological Structure by Example (9)

    Find points on a structure map where fault traces intersect

    Each fault must be matched with every other fault, butpairwise comparison of vertices is not enough, because faultscould intersect anywhere, not just at vertices. Each adjacent

    vertex pair from one fault must be compared with everyadjacent pair of another fault to check if the lines intersect.

    The node list in the node topology table is searched for nodeswith at least two lines where the lines are classified as faults.

    Raster vs. Vector Structure

    Different structures are used for different tasks, depending whichare the most efficient and most suitable.

    The raster structure is particularly efficient for the overlay ofmultiple data layers (image processing); raster images occupy largeamounts of storage space.

    Spaghetti structure is efficient for displaying objects by theirboundaries (cartography).

    Topological structure facilitates search that requires adjacency,containment, and connectivity information, because they areexplicitly stored and separated from the spatial coordinates(geometry).

    Raster and Vector Model, Structure, for larger dimensions

    Raster model:The generalization from 2d pixels to

    3d voxels preserves representation byenumeration, and a rudimentarytopology of pixels and voxels,respectively, but not of geoobjects.

    Vector model:The boundary representation

    generalizes from polygonsgiven by ordered 0d verticesconnecting to 1d edgesto polyhedragiven by 0d vertices connecting to 1dedges and 2d faces.Topology is a major challange(GMaps).