introduction to geographic information systems spring 2013 (inf 385t-28437) data modeling,...
Post on 26-Feb-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
Introduction to Geographic Information Systems Spring 2013 (INF 385T-28437)
Data Modeling,Database Design
Dr. David ArcturLecturer, Research Fellow
University of Texas at Austin
Lecture 11March 21, 2013
You are here
Food is there
Well, now it’s there
3INF385T(28437) – Spring 2013 – Lecture 11
4
Outline The design process Describing a system of viewpoints Developing use cases GIS design stages Conceptual design Normalization example Database diagrams Building the database Prototype to production
INF385T(28437) – Spring 2013 – Lecture 11
5
Best practices: iterative, incremental development
Ideas, Use Cases, Requirements
Planning, Cost-Benefit,
Risk Management
Analysis, Design
Development, Quality Control
Deployment
Maintenance, Evaluation
and Evaluation
and EvaluationNote: This is only an illustration – your
experience may differ!INF385T(28437) – Spring 2013 – Lecture 11
6
Best practices: iterative approach Focus on current issues
Short iterations better reflect near-term internal and external environment
Resolve misunderstandings early Resolve analysis, design and
implementation disparities early More accurate overall project status Workload better distributed across life
cycle Medium- to long-term changes in
environment factored in more easily through spiral approach
INF385T(28437) – Spring 2013 – Lecture 11
7
Starting out Define the project Get to know the client
Stakeholders and decision makers
Build the design team Analyst Subject matter experts Users
Inventory tasks, products, data Identify model Develop Use Cases Inventory data
GIS Design Seminar Teach client GIS concepts
and design methods Introduce yourself and the
design team Set realistic expectations
for the GIS system Alleviate fears and
concerns
INF385T(28437) – Spring 2013 – Lecture 11
Building a System… of Viewpoints
Viewpoints in “Reference Model - Open Distributed Processing (RM-ODP)” ISO/IEC 10746
InformationViewpoint
ComputationalViewpoint
EngineeringViewpoint
Implementation/Development
TechnologyViewpoint
EnterpriseViewpoint
Community Objectives Business aspects: purpose, scope and policiesWhat for? Why? Who? When?
Information sources and models What is it about?
Types of services and protocolsHow does each bit work?
Solution types: distribution infrastructureHow do the components work together?
Implementation system: hardware, software, distributionWith what?
Abstract/Best Practices
9
Use cases A description of a task you want the system to
perform Add new water service, record parcel sale
Basis of all analysis and design Start simple; expand with detail later
Analysis of use cases yields data, interfaces, applications
Use cases can: Capture existing work flows Define new applications Help understand alternative
and pathological work flows
Notify Owners
Data
GIS Database
Use case
INF385T(28437) – Spring 2013 – Lecture 11
10
Use case diagrams Show the actor/use case
relationships System architecture Data flows, coordination
Develop with users During meetings,
interviews Clean up and refine later
Graphical notation is helpful, but the use case document is the most important artifact
Operations & Maintenance
Staff
GIS Analyst
Produce flood maps
Manage flood control
structures
Document locations of
flooding
Emergency Call Center
System boundary
Actorand name
Use caseand name
Geocode & map call list
INF385T(28437) – Spring 2013 – Lecture 11
11
Example use case Some context: For emergency response application,
assume a set of use cases focused on exchanging information leading to creating, updating, and posting flood maps to interested agencies.
Each use case is documented according to a template, such as:
Use Case NameDescription:Actors:Pre-conditions:Post-conditions:Flow of events:- business rules- user actions, responsesExceptions:Alternates:
INF385T(28437) – Spring 2013 – Lecture 11
12
Example use case Use case: Flooding Information & Response Description: The call center receives and documents citizen
calls related to flooding during storm events. The information is geocoded, mapped, and provided to the Water District’s Operation & Maintenance staff, which makes decisions to manage control structures to mitigate flooding.
Actors: Emergency Response Call Center, GIS Analyst, Operations & Maintenance Staff.
Pre-conditions: The database of critical water facilities has been created. The Emergency Call Center has been activated for a storm event. Water district staff are operating under emergency operating procedures.
Post-conditions: Status change notices are sent to the relevant agencies registered to receive updates.
INF385T(28437) – Spring 2013 – Lecture 11
13
Use case primary scenario 1 Citizen places a call to the Emergency Call
Center hotline.2 Call center staff document location and
description of flooding problem.3 GIS analyst receives and geocodes locations
from the call center, producing a map of call locations.
4 Call reports are symbolized based on flooding issue. Maps are produced and handed off to the operations staff.
5 Operations & Maintenance staff review maps of flooding incidents and make decisions for operating control structures, gates, and pumps.
INF385T(28437) – Spring 2013 – Lecture 11
14
Are we finished yet? What layers will you need based on the
use case? Where will you get these layers? Are there any changes that can be
made to the business process?
INF385T(28437) – Spring 2013 – Lecture 11
15
Using the use cases From the set of use cases developed, the functional
requirements and interfaces can be fleshed out. From an understanding of the collaborators and
stakeholders involved in the use cases, appropriate data sources and maintenance authorities can be determined.
From a comparison of all the use cases, redundant information and tasks can be discovered and minimized.
From an examination of potential alternate scenarios, pathological situations can be anticipated and mitigated.
BUT… beware the use case time sink You cannot completely and correctly document all the use
cases for a reasonably complex system in your lifetime Keep it simple, and start prototyping as soon as you can –
this will further inform the use cases and keep your project moving INF385T(28437) – Spring 2013 – Lecture 11
GIS DESIGN
INF385T(28437) – Spring 2013 – Lecture 11
17
Designing the database
Physical model
table
table
table
Logical model
ProjectFeature collection
Rail
Boundaries
Roads Network
Topology
Relationship
Conceptual model
Key
Business practices
Collect information, identify desired themes and sources
Map themes to GIS database elements: define database entities and organization
Complete data organization, build full schema, test and refine
INF385T(28437) – Spring 2013 – Lecture 11
18
GIS design practice Think about the GIS features represented
by thematic layers, and about the integrity and behavior of those features-- Parcels are represented as polygons. Parcels share geometry with boundaries. Parcels do not overlap. … etc.
INF385T(28437) – Spring 2013 – Lecture 11
19
Conceptual design Entities, general relationships, important
attributes Sketches ER/UML conceptual diagrams Spreadsheets Often reconstruct from existing
systems/datasets Very important for complex projects Very useful to communicate with domain
experts/business peopleINF385T(28437) – Spring 2013 – Lecture 11
20
Conceptual design Purpose and usage of GIS Data sources
coverages, shapefiles, CAD, etc.
compilation scale and accuracy
Spatial representation raster, vector, surface,
address Attributes
required fields, types of measurement
Relationships network, topological, general
INF385T(28437) – Spring 2013 – Lecture 11
21
Conceptual designIt is important to understand what you want to achieve from the outset Symbology and labels
what symbols at which scales text presentation on the map
Spatial reference projection and datum the largest area mapped required detail and resolution
Special design cases, for example: condominium parcels parcel annotation
Key
INF385T(28437) – Spring 2013 – Lecture 11
22
Diagramming themes
Classic layer diagrams Organize data into logical
units Focus on common data
elements to help determine:
Attributes Associations Spatial relationships Roads
Boundaries
Utilities
Hydrology
Water Use Application
INF385T(28437) – Spring 2013 – Lecture 11
LayerMap use
Data sourceRepresentation
Spatial relationshipsMap scale, accuracy, currency
Symbology and annotation
StreetsDefine the street centerline networkPublic or commercial data products or various government agenciesPolylinesStreets intersect only at endpoints and generally do not overlap1:12000, +/- 10 ft., semiannual updateSymbolized according to road classification, labeled with street name
LayerMap use
Data sourceRepresentation
Spatial relationshipsMap scale, accuracy, currency
Symbology and annotation
ParcelsParcels define land ownership and are used for taxationCompiled from land ownership transactions and cadastral recordsPolygonsParcel polygons do not overlap1:2400, +/- 5 ft, quarterly updateLabeled or annotated with house number and street name
Documenting themes
2-23
24
Inventory existing dataLegacy data Target data layers
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
Annotation
Boundaries
Lots
Parcels
INF385T(28437) – Spring 2013 – Lecture 11
25
Legacy data GIS Database
• Boundaries hold survey attributes• Coverage parcel polygons only exist for regions
Target Data Layers
Model database schema from existing data Bridge existing data with current technology, for
example:
PLSS Monument
PLSS Quarter
PLSS Section
PLSS Township
Annotation
Boundaries
Lots
Parcels
INF385T(28437) – Spring 2013 – Lecture 11
Inventory existing data
DATABASE DESIGNLecture 11
INF385T(28437) – Spring 2013 – Lecture 11
27
Files, databases, and GIS Data files contain text or other data in arbitrary
formats Data tables contain records with fields (attributes,
data items) identified by a primary key Relational Database Management System
(RDBMS or just DBMS): creates and maintains relationships between data tables allows one or more users to create or edit data in the
tables allows users to sort, select, and retrieve information using
QUERIES and REPORTS GIS adds a spatial dimension to databases, by
integrating location and geometric shape information with the tables
INF385T(28437) – Spring 2013 – Lecture 11
28
Relational databaseA formal information model called “relational”
Tables can have formal & ad hoc relationships, based on: Rows and columns Known column types Relationships SQL language and operators
Relational is based on a simple, generic model with many implementations (MS Access, IBM DB2, Oracle, MS SQLServer, and many others)
AbCdEfxz
32121753
3491146
RAXG
11122412
edvgklop
4922
wfcv
INF385T(28437) – Spring 2013 – Lecture 11
29
Data tables
Record or Row
Attributeor Column
Cell or Value
Organized into columns, rows, and cells (like a spreadsheet)
Columns = attributes = fields = data itemsRows = recordsCells = values
INF385T(28437) – Spring 2013 – Lecture 11
30
Defining columns
All DBMS’s support basic types:• Number (integer, float, decimal)• String (text)• Boolean (Yes/No)• DateMany DBMS (SQL-99) and GIS systems
support additional types (BLOB, XML, time series, …)
To define a column or attribute, you must specify the column name and type
INF385T(28437) – Spring 2013 – Lecture 11
Primary key
The field or combination of fields that identifies each and every record uniquely within a table
Note: Primary key is more often arbitrary, meaningless to users; main purpose is to be unique
3-31
Water use permit example
• Paper-based application form required for permit to withdraw surface or ground water
3-32
33
Sample data for water use permits
Withdrawal Code
Application ID
OwnerAvg Gallons/D
ay
Max Gallons/Day
Source Code
Source Name
Use Codes
INF385T(28437) – Spring 2013 – Lecture 11
34
use codes
INF385T(28437) – Spring 2013 – Lecture 11
35
Relational organization Tables should be organized according to basic rules
of relational design for most efficient use. Normalization is a series of steps followed to
obtain a database design that allows for efficient access and storage of data in a relational database. These steps reduce data redundancy and the chances of data becoming inconsistent.
3NF or BCNF are the usual standards for relational database design, however performance and convenience may drive toward de-normalization.
INF385T(28437) – Spring 2013 – Lecture 11
36
Database normalization steps First Normal Form (1NF) eliminates repeating groups by putting
each into a separate table and connecting them with a one-to-many relationship.
Second Normal Form (2NF) eliminates functional dependencies on a partial key by putting the fields in a separate table from those that are dependent on the whole key.
Third Normal Form (3NF) eliminates functional dependencies on non-key fields by putting them in a separate table. At this stage, all non-key fields are dependent on the key, the whole key and nothing but the key.
Boyce-Codd Normal Form (BCNF) is sometimes applied as a stronger form of 3NF in which every determinant of a functional dependency within a relation must be a candidate key for the schema.
Fourth Normal Form (4NF) separates independent multi-valued facts stored in one table into separate tables.
Fifth Normal Form (5NF) breaks out data redundancy that is not covered by any of the previous normal forms.
source - http://www.hyperdictionary.com/dictionary/database+normalisation
INF385T(28437) – Spring 2013 – Lecture 11
37
First normal form - NOT Do you see any groups of repeating columns?
What’s wrong with that? Can you think of a case where this is okay?
How would you reorganize to fix this?
INF385T(28437) – Spring 2013 – Lecture 11
38
First normal form
The Use Code columns can be removed from the main table (A), and made into rows of a separate table (B), keyed by ActID. (compare with previous slide)
(C) is a lookup table for use code descriptions.
Primary key
Foreign keys
Primary key
B C
AINF385T(28437) – Spring 2013 – Lecture 11
39
Relationship cardinality With this design, one ActID can have any number
of use codes, and any one use code can be associated with many ActID’s This is called a Many-to-Many (M:M) relationship This is much more space-efficient for data storage
One record in the Use Code Descriptions table (C) can be associated with many records in the ActID-Use Codes table (B) This is called a One-to-Many (1:M) relationship
You may also have relationships with fixed cardinalities, such as 1:1, 1:2, 1:0..5, etc.
Cardinality of 1:0 generally means “nulls are allowed”
INF385T(28437) – Spring 2013 – Lecture 11
40
Second normal form - NOT Do you see any dependencies between non-key
columns and a partial key? If the primary key were compound and included an
OwnerID, there could be such a dependency between Owner and OwnerID
What’s wrongwith that?
What wouldyou do to fix this?
Compound key
INF385T(28437) – Spring 2013 – Lecture 11
41
Remove the non-key data to a separate table and link to it
… and clean up the data while you’re at it! Spelling, abbreviations,
punctuation Firstname Lastname vs.
Lastname, Firstname
Second normal form
INF385T(28437) – Spring 2013 – Lecture 11
Third normal form - NOT• Do you see any functional dependencies
among non-key fields in the table below?» Need we ask again: what’s wrong with this?
• How would you reorganize to fix this?
3-42
43
Third normal form
Remove the source description to a separate table, and join using the source code field
This will reduce duplication of data (and errors)INF385T(28437) – Spring 2013 – Lecture 11
Is that all there is to it?Name City ST PostalCodeJohn Denver CO 80031Patty Seattle WA 98107Smith Vancouv
erBC V6C 1T2
This table is NOT in Third Normal Form:○ The PostalCode field is dependent on the City and ST fields
To place this table in 3NF, a separate table would be created for the City and ST fields, and joined using the PostalCode field○ But this is generally not done with address & postal codes…
WHY?
45
Normalization tradeoffs When would you expect to normalize tables?
For primary data entry and updates; easier to set up and manage data integrity validation
Such as name and address subfields To support more kinds of ad hoc queries
When would you expect to denormalize? For presentation of data to users To reduce the number of table-joins for faster
performance Queries are known and fixed Better performance for web publishing
Database views are often used to flatten relationship structure for read-only access
INF385T(28437) – Spring 2013 – Lecture 11
DATABASE DIAGRAMMINGLecture 11
INF385T(28437) – Spring 2013 – Lecture 11
47
Database diagramming: conceptual / logical overview
Database relationships and cardinality can be diagrammed for prototyping and documentation
The “*” on an associationlink means “many”
Owners Use Codes
Use Code Descrips
Applications*1
1
* **
INF385T(28437) – Spring 2013 – Lecture 11
48
Database diagramming:Entity-Relationship (E-R) or Unified Modeling Language (UML)
Owners*OwnerIDFirstNameLastNamePhoneStreetAddrCityStatePostalCode
Use Codes*RowIDUseCodeApplicationID
Use Code Descrips*UseCodeDescription
Applications*ApplicationIDOwnerIDApplicationTypeProjectLocationBusinessNameBusinessType…
*1
1
* *
*
INF385T(28437) – Spring 2013 – Lecture 11
49
Normalization tradeoff: referential integrity Suppose you removed a record from the
Owner table What should be done with the related records from
the Applications table? Would this be easier or harder to manage than
with the de-normalized design on slide 31, “Sample data for water use permits”?
The more tables are interconnected by relationships, the greater the need to support referential integrity within your applications A DBMS’ default support for referential integrity
may be very basic, such as to place Nulls in associated foreign key fields, but only when a relationship is declared
INF385T(28437) – Spring 2013 – Lecture 11
BUILDING THE DATABASE
Lecture 11
INF385T(28437) – Spring 2013 – Lecture 11
51
Prototype prototype prototype… Critical for validating your data model
and applications An easy way to discover project
requirements Don’t plan a lot of time for this, just do
it! Prototype in the simplest
environment to learn the most in the least time Validate that thematic choices, schema &
integrity rules support your requirements Reduce data management overhead with
personal, single-user systemINF385T(28437) – Spring 2013 – Lecture 11
52
Database environments Production/Publishing
Read-only copies of databases Used by majority of users Contains custom views of databases
Development/Maintenance Where compilation and editing occur Normalized for greater integrity enforcement May have multiple environments by data model
(cadastral/land use, transportation, utilities, hydro…)
Design/Test Prototype validation, load testing Isolate testing changes to the development
environment, so as not to corrupt the development system
INF385T(28437) – Spring 2013 – Lecture 11
1-53
Large projects can seem like this…
1. Burning toast…
smoke alarm…
…fill glass!
etc, etc…
1-54
But they can be simplified with common data models
Should have: Simple structure with most common elements across a
set of applications in a user community Minimal rules, custom behavior, or cross-dependencies May include collections or sets or systems of feature
classes, e.g., networks, topologies, terrains
Should lend to: Web distribution Incremental, multi-user data updates User-side fusion, densification, value-adding
http://support.esri.com/en/knowledgebase/techarticles/detail/40585
55
Summary The design process Describing a system of viewpoints Developing use cases GIS design stages Conceptual design Normalization example Database diagrams Building the database Prototype to production
INF385T(28437) – Spring 2013 – Lecture 11
top related