outline: overview of models data and levels of measurements raster and vector models conversion...
TRANSCRIPT
OUTLINE: Overview of models Data and levels of
measurements Raster and vector models Conversion between models Databases
DDATA ATA MMODELSODELS ININ GISGIS
DIGITAL INFORMATION
GIS requires that both data and maps be represented as numbers
GIS places data into the computer’s memory in a physical data structure (i.e. files and directories).
files can be written in binary or as ASCII text.
binary is faster to read and smaller, ASCII can be read by humans and edited but uses more space.
sent through a “pipe” consisting of 0s and 1s
stored on devices that can store only 0s and 1s
processed as 0s and 1s
locational and attribute data in a GIS
attribute type: discrete vs continuous
discrete: presumed to occur at distinct locations with empty locations having a value of zero for the attribute in question
continuous: feature occurs throughout geographical region; no locations are empty
DATADATA
Levels of Measurement:
four levels are commonly recognized – nominal, ordinal, interval and ratio
each subsequent level includes all characteristics of preceding levels
data available at higher levels can be reduced to lower levels; opposite is not true
DATADATA
LEVEL OF MEASUREMENTSLEVEL OF MEASUREMENTS
Nominal Scale
objects are classed into groups; groups possess arbitrary labels (numbers/names)
i.e. religion, land use/cover
discrete variable
Ordinal Scale
categorization plus an ordering/ranking of data
i.e. country road, street, highway
can identify larger/smaller but can not comment on degree between variables
K=5, L=3, M=1 equivalent to K=500, L=300, M=10
discrete variables
LEVEL OF MEASUREMENTSLEVEL OF MEASUREMENTS
Interval Scale
measurements arranged in rank and distance between measurements is known
no “true” zero point
i.e. elevation/topographic lines, temperature in oC
discrete or continuous
LEVEL OF MEASUREMENTSLEVEL OF MEASUREMENTS
Ratio Scale
like interval scaling: both rank and separation are known, but there is also a known, fixed starting point
i.e. temperature on Kelvin scale; speed
continuous and discrete
LEVEL OF MEASUREMENTSLEVEL OF MEASUREMENTS
1. Reality – total phenomena as they actually exist
2. Conceptual Data Model – describes and defines included entities (how they will be represented)
3. Logical Data Model – logical organization of the database elements
4. Physical Data Model or File Structure – how information will be structured for access
DATA MODELS – REPRESENTING DATA
DATA MODELS – REPRESENTING DATA
DATA MODELS DATA MODELS
logical data model is how data are organized for use by the GIS.
GISs have traditionally used either raster or vector for maps.
raster – based on pixels
vector – based on points, lines and polygons
while most GIS systems can handle raster and vector, only one is used for the internal organization of spatial data.
rasters and vectors can be flat files … if they are simple
Vector-based line
4753456 6234124753436 6234244753462 6234784753432 6234824753405 6234294753401 6235084753462 6235554753398 623634
Flat File
Raster-based line
00000000000000000001100000100000101010000101000011001000010100000000100010001000000010001000010000010001000000100010000100000001011100100000000100001110000000000000000000000000
Flat File
DATA MODELS DATA MODELS
RASTER DATA MODELS RASTER DATA MODELS basic unit is cells or pixels which are uniformly
spaced
each cell/pixel has spatial and spectral information.
i.e. digital elevation data and digital images
spatially exhaustive sampling of the area of interest
every cell has a value, even if it is “missing.”
cell has a resolution, given as the cell size in ground units.
higher resolution, smaller cell dimensions
Generic structure for a grid.
Row
s
Columns
Gridcell
Grid extent
Resolution
RASTER DATA MODELS RASTER DATA MODELS
RASTER DATA MODELS RASTER DATA MODELS
Fining of ResolutionFining of Resolution
RASTER DATA MODELS RASTER DATA MODELS
RASTER DATA MODELS RASTER DATA MODELS
CREATING RASTER DATA MODELS
CREATING RASTER DATA MODELS
creating raster is like laying a grid over a map
code each cell with a value representing attribute
every cell has a value, even if null or zero (integers, ratios, etc.)
values for each cell are written into a file
spreadsheet, data base, word processor
imported into GIS so it can be reformatted
each pixel presumably has one value – in reality is this correct? mixed pixel issue
RASTER AND MISSING DATA RASTER AND MISSING DATA
GIS data layer as a grid with a large section of “missing data,” in thiscase, the zeros in the ocean off of New York and New Jersey.
MIXED PIXEL ISSUE MIXED PIXEL ISSUE
W GW
W W G
W W G
W GG
W W G
W G G
W GE
W E G
E E G
Water dominates Winner takes all Edges separate
““Largest share”Largest share”
““Central point”Central point”
35%
100%80%
70%
Land
Water
““Presence/Absence”Presence/Absence”
““Percent occurrence”Percent occurrence”
MIXED PIXEL ISSUE MIXED PIXEL ISSUE
raster data visualized as map layers
map layer: data describing a single characteristic for a location
multiple items of information require multiple layers
creates problems – raster databases can become enormous
each map layer has thousands of cells
CREATING RASTER DATA MODELS
CREATING RASTER DATA MODELS
Advantages
simple data structures
each cell can be owned by only one feature.
overlay and combination of maps and remote sensed images easy
simulation easy, because cells have the same size and shape
technology is cheap
RASTER DATA MODELS
RASTER DATA MODELS
Advantages
some spatial analysis methods simple to perform
local: cell by cell calculations
focal: models cell value based on neighbours
zonal: models cell value based on geographical areas
global: models cell value based on all cells
RASTER DATA MODELS
RASTER DATA MODELS
Disadvantages
volumes of graphic data
use of large cells to reduce data volumes
poor at representing points, lines and areas; good at surfaces
must often include redundant or missing data
network linkages are difficult to establish
projection transformations are time consuming
RASTER DATA MODELS
RASTER DATA MODELS
COMPRESSION TECHNIQUES
COMPRESSION TECHNIQUES
raster compression techniques used in GIS are run-length encoding and quad trees
Run-length Encoding – more efficient
values often occur in runs across several cells
form of spatial autocorrelation
e.g. array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1
Row-by-row coding:Row-by-row coding:
CCCCCBBDCCCCBBDCCCBBBDDCBBAADDDDBAADDBBBAADDDAAAADDDAAAA
Run-length coding:Run-length coding:
5C 2B 1D 4C 2B 1D 3C 3B 2D 1C 2B 2A 4D 1B 2A 2D 3B 2A 3D 4A 3D 4A
56 entries for 7x8 array, or56 entries for 7x8 array, or
22 pairs (44 entries) for 7x8 array22 pairs (44 entries) for 7x8 array
A. Mixed Conifer
B. Douglas Fir
C. Oak Savannah
D. Grassland
RUN-LENGTH CODINGRUN-LENGTH CODING
Quadtree Compression
hierarchical data model using a variable-sized grid cell
finer subdivisions are used in areas requiring finer detail (higher resolution)
pixel in each higher layer is derived from average or majority of 4 pixels from the lower layer
not as efficient for more variable or complex data
used primarily as a way to store data for rapid retrieval on display devices
COMPRESSION TECHNIQUES
COMPRESSION TECHNIQUES
QUAD TREE STRUCTURE QUAD TREE STRUCTURE
RASTER DATA FORMAT
RASTER DATA FORMAT
most raster formats are digital image formats.
most GISs accept TIF, GIF, JPEG or encapsulated PostScript, which are not georeferenced.
DEMs are true raster data formats.
RASTER DATA FORMAT
RASTER DATA FORMAT
think of world as a space populated by discrete features of various shapes and kinds – points, lines, areas.
any location in space may be empty or occupied by one or more point, line or area.
VECTOR DATA MODELS VECTOR DATA MODELS
point zero-dimensional abstraction of an object represented by a
single X,Y co-ordinate.
normally represents a geographic feature too small to be displayed as a line or area
stored by their real (earth) coordinates
VECTOR DATA MODELS VECTOR DATA MODELS
line set of ordered co-ordinates that represent the shape of
geographic features too narrow to be displayed as an area at the given scale or linear features with no area
lines and areas are built from sequences of points in order.
lines have a direction to the ordering of the points.
VECTOR DATA MODELS VECTOR DATA MODELS
polygon feature used to represent areas.
defined by the lines that make up its boundary and a point inside its boundary for identification.
have attributes that describe the geographic feature they represent.
VECTOR DATA MODELS VECTOR DATA MODELS
vector data evolved the arc/node model in the 1960s.
an area consist of lines and a line consists of points.
points, lines, and areas can each be stored in their own files, with links between them.
endpoint of a line (arc) is called a node; arc junctions are only at nodes.
stored with the arc is the topology (i.e. the connecting arcs and left and right polygons).
VECTOR DATA MODELS VECTOR DATA MODELS
TOPOLOGYTOPOLOGY
topological data structures dominate GIS software.
stored explicitly
allows automated error detection and elimination.
rarely are maps topologically clean when digitized or imported.
GIS has to be able to build topology from unconnected arcs.
Arc/Node Map Data Structure with Files.
1 1,2,3,4,5,6,7
Arcs File
POLYGON “A”
A: 1,2, Area, Attributes
File of Arcs by Polygon
1
23
4
5
6
7
8
9
10
1112
13 1 x y2 x y3 x y4 x y5 x y6 x y7 x y8 x y9 x y10 x y11 x y12 x y13 x y
Po
ints
Fil
e
1
2
2 1,8,9,10,11,12,13,7
TOPOLOGYTOPOLOGY
relationship between nodes, arcs and polygons.
topologically structured database for ease of retrieval and implementation of spatial-relational operations.
advantages:
simple, elegant and efficient
relational database construction and analysis
complete topology makes map overlay feasible.
topology allows many GIS operations to be done without accessing the point files.
TOPOLOGYTOPOLOGY
VECTOR DATABASE CREATIONVECTOR DATABASE CREATION
database creation involves several stages: input of the spatial data
input of the attribute data
linking spatial and attribute data
spatial data is entered via digitized points and lines, scanned and vectorized lines or directly from other digital sources
once the spatial data has been entered, much work is still needed before it can be used
Building Topology
once points are entered and geometric lines are created, topology must be "built"
this involves calculating and encoding relationships between the points, lines and areas
this information may be automatically coded into tables of information in the database
VECTOR DATABASE CREATIONVECTOR DATABASE CREATION
Editing
during topology generation process, problems such as overshoots, undershoots and spikes are either flagged for editing by the user or corrected automatically
automatic editing involves the use of a tolerance value which defines the width of a buffer zone around objects within which adjacent objects should be joined
VECTOR DATABASE CREATIONVECTOR DATABASE CREATION
Advantages
good representation of structures (points, lines, polygons)
compact and more efficient
topology can be completely described
accurate graphics
retrieval, updating and generalization of graphics and attributes possible
work well with pen and light-plotting devices and tablet digitizers.
VECTOR DATA MODELS
VECTOR DATA MODELS
Disadvantages
complex data structures
combination of several vector polygon maps or polygon and raster maps through overlay creates difficulties
simulation is difficult
display and plotting can be expensive
technology is expensive
not good at continuous coverage or plotters that fill areas.
TIN must be used to represent volumes.
VECTOR DATA MODELS
VECTOR DATA MODELS
vector formats are either page definition languages or preserve ground coordinates.
page languages are HPGL, PostScript, and Autocad DXF.
true vector GIS data formats include ArcView Shapefiles and ArcGIS Interchange Files (E00) which has topology.
VECTOR DATA FORMATS
VECTOR DATA FORMATS
List of coordinates “spaghetti”
simple
easy to manage
no topology
lots of duplication, hence need for large storage space
very often used in CAC (computer assisted cartography)
VECTOR DATA MODELS
VECTOR DATA MODELS
Vertex Dictionary
no duplication, but still this model does not use topology
VECTOR DATA MODELS
VECTOR DATA MODELS
Dual Independent Map Encoding (DIME)
developed by US Bureau of the Census
nodes (intersections of lines) are identified with codes
assigns a directional code in the form of a "from node" and a "to node"
both street addresses and UTM coordinates are explicitly defined for each link
VECTOR DATA MODELS
VECTOR DATA MODELS
VECTOR TO RASTER EXCHANGE
VECTOR TO RASTER EXCHANGE
data exchange by translation (export and import) can lead to significant errors in attributes and in geometry.
efficient data exchange is important for the future of GIS.
VECTOR TO RASTER EXCHANGE
VECTOR TO RASTER EXCHANGE
triangulated irregular network is a set of elevation points which have been connected to form a network of triangles.
developed in early 1970s as a simple way to build a surface
the sample points are connected by lines to form triangles; within each triangle the surface is usually represented by a plane
triangles fit together in a manner which simulates the face of the land.
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
irregularly spaced sample points can be adapted to the terrain
rough terrain - more points
smooth terrain - less points
an irregularly spaced sample is more efficient
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
TINs can be seen as polygons having attributes of
slope, aspect and area,
three vertices having elevation attributes
TIN model work best in areas with sharp breaks in slope
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
Advantages ability to describe the surface at different level
of resolution efficiency in storing data allows simple calculation of basin areas,
slopes, channels, and many other geometric parameters
Disadvantages in many cases require visual inspection and
manual control of the network
ADVANCED DATA MODELS - TIN
ADVANCED DATA MODELS - TIN
a spatial database is a collection of spatially referenced data that acts as a model of reality
these selected phenomena are deemed important enough to represent in digital form
the digital representation might be for some past, present or future time period
DATABASESDATABASES
scaleless- data can be stored at the level of detail found in the environment
cartographer is responsible for choosing the content and resolution
scale critical factor:
level of resolution set by field instruments
digitizing - resolution of instrument and abstraction and production factors
DIGITIAL DATABASES
DIGITIAL DATABASES
problems when using data sets of different resolutions
i.e. roads may not line up
resolved using ancillary source materials
additional problems when using data sets of different themes
i.e. combing elevation and drainage data – water running uphill or non-level lakes
DIGITIAL DATABASES
DIGITIAL DATABASES
Value of databases:
Cost of creationCost of creation – cheaper to get data from an existing database
Appropriateness of useAppropriateness of use
Lack of alternative data sourcesLack of alternative data sources
Graphic outputGraphic output
DIGITIAL DATABASES
DIGITIAL DATABASES
“data about the data”
could include data elements that: identify the data, identify the custodians and access conditions to the data, describe projection, content, quality of data
describes the action taken when handling databases of varying scale
METADATA
METADATA
Dataset information
Title Ortofotos'95
Abstract Ortofotos'95 is a collection of ortho-rectified aerial photographs. These aerial photographs cover Portugal and were obtained in August 1995 in false color infra red film at scale 1:40 000. CNIG, The Directorate General of Forests and The Paper Mill industry are the owners of the aerial photographs (in paper format).
Type of dataset Airborne data>Aerial photos
Locations Portugal
Temporal Range 1995-
Dataset scales 1:25 000-1:50 000
Dataset resolution 1 - 3 meters
Dataset quality remarks
Aquisition of data: aerial photographs, the film is scanned at very high resolution and ortho-rectified using DTM derived from topographic cartography at scale 1:25 000
Information creation date
1999-10-29
pre-1970s, command line based with read and write to hard disk, tapes, diskettes
database approach – all reading and writing through simple interface (no need to care about tapes, etc.)
small GIS projects sufficient to store geographic information as simple files.
with large data volumes and number of data users best to use a database management system (DBMS)
relational design has been the most useful (since 1980s)
DATABASES
DATABASES
DATABASE MANAGEMENT SYSTEMS
DATABASE MANAGEMENT SYSTEMS
contain tables or feature classes in which:
rows: entities, records, observations, features
all information about one occurrence of a feature
columns: attributes, fields, data elements, varaibles
one type of information for all features
key field is an attribute whose values uniquely identify each row Parcel Table
Parcel # Address Block $ Value8 501 N Hi 1 105,4509 590 N Hi 2 89,78036 1001 W. Main 4 101,50075 1175 W. 1st 12 98,000
entity
AttributeKey field
tables are related or joined using a common record identifier (column variable) present in both tables
Example:
goal: produce map of values by distinct/neighbourhood
problem: no distance code available in parcel table
DATABASES - RDBM
DATABASES - RDBM
Parcel TableParcel # Address Block $ Value
8 501 N Hi 1 105,4509 590 N Hi 2 89,78036 1001 W. Main 4 101,50075 1175 W. 1st 12 98,000
solution: join parcel table containing values with geography table containing location codings, using Block as key field
Geography TableBlock District Tract City
1 A 101 Dallas2 B 101 Dallas4 B 105 Dallas12 E 202 Garland
Secondary or foreign key
Parcel TableParcel # Address Block $ Value
8 501 N Hi 1 105,4509 590 N Hi 2 89,78036 1001 W. Main 4 101,50075 1175 W. 1st 12 98,000
DATABASES - RDBM
DATABASES - RDBM
Water RightLocations
Relational LinkagesSpatial Attributes
Descriptive Attributes
DATABASES - RDBM
DATABASES - RDBM
Advantage
very flexible
export data to another system easily
enables simple operations
i.e. search for records satisfying some condition
DATABASESDATABASES
Description Thickness Code
New Ice <10 cm 1
Nilas, Ice Rind 0-10 cm 2
Young Ice 10-30 cm 3
Grey Ice 10-15 cm 4
Grey-White Ice 15-30 cm 5
First-Year Ice 30-200 cm 6
Thin First-Year Ice 30-70 cm 7
Thin First-Year Ice, first stage 30-50 cm 8
Thin First-Year Ice, second stage 50-70 cm 9
Medium First-Year Ice 70-120 cm 1.
Thick First-Year Ice 120-200 cm 4.
Old Ice 7.
Second-Year Ice 8.
Multi-Year Ice 9.