Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Lecture 3
Part 1.Important Database Concepts
Part 2. Queries
Lecture slides by Austin Troy, University of Vermont, © 2010, except where noted
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
•People use number system with base 10•Each digit corresponds to 10 to some power•Hence a number with 3 digits has 103 or 1000 possibilities•Why are computer values so often in multiples of eight?•Because computers use a base 8 system of storing numbers and values•A byte is 8 “on-off switches” or bits •Each switch/bit represents a binary number; one byte is 28 or 256 possibilities
How is Data Stored?
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
•Switch combinations determine base ten number based on the formula:
•N10= 2b-1+2b-2+…2b-b
•Where b= number of bits storing the number•Hence the binary number
111111112 = 27*1+ 26 *1 + 25 *1 + 24 *1 + 23 *1 + 22 *1 + 21 *1 + 20 *1 = 25510
•And the binary number111111102 = 27*1+ 26 *1 + 25 *1 + 24 *1 + 23 *1 + 22
*1 + 21 *1 + 20 *0 = 25410
How do binary numbers translate to real numbers?
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010 except where noted
Another approach to coding numbers: ASCII (American Standard Computer Info Index) Based on Hexadecimal Numbering System •4 bit or base sixteen (24) system for representing numbers•0-9 =0-9 but 10-15= A,B,C,D,E,F•Each digit represents up to 16 instead of 10•So, the first digit in a two digit number xy= (16*x)+y•Hence
•21h= (16*2) +1 = 3310 = 001000012
•B2h= (16*11) + 2 =17810 = 101100102
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
ASCII system provides standardized method for coding alphanumeric characters, and uses byte of 8 bits for each symbol. Those characters include everything you see on your keyboard and then some
Standard character set is coded as hexadecimal numbers going from zero to FF (28).
Example: Letter ‘A’ is 41h = 6510=010000012
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
•Number of possible values for a unit of data is an exponential function of the number of switches •28=256 eight bit data•216=65,536 sixteen bit data•232= 4,294,967,296 thirty two bit data
Number of Possible Values is fn of number of bits
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Number of bits determines data types
Examples of Integer data typesByte: 28 (0 to 255)Short Integer: 216 (ranges from –32,767 to +32,767 without decimals, the sixteenth bit determines sign)Long Integer: 232 (+/-2.147483e+09 )
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Floating point data types
• In this case the number can have a decimal, but the number of places is variable
• With this type of number the number of bits determines not just the number of possible magnitudes but also the level of precision of the decimal, represented as number of decimal places.
• Fewer bits in FP numbers can lead to rounding errors
• Two types of FP number• Single Precision: Often 232
• Double Precision: Usually double the bits of single precision (i.e. 264)
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Other data types• Currency (type of number with specific
behaviors)• Date (recognizes order in dates)• String (text)
– When numbers are represented as text they have no numerical properties (e.g. zip codes)
• Boolean (yes, no)• Object (e.g. pictures, bits of code, behaviors,
multi-media, programs)
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Three database models
• Hierarchical • Network • Relational
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Hierarchical Database ModelA one-to-many method for storing data in a database that
looks like a family tree with one root and a number of branches or subdivisions. Problem: linkages in the tables must be known before
Groovy 70s TV
Action shows Drama Sitcoms
Dukes of Hazzard CHIPs
Dallas Fantasy Island
WKRP Welcome back Kotter
Tom Wopat
Eric Estrada
Gabe Kaplan
Loni AndersonLarry
Wilcox
Larry Hagman Ricardo
MontalbanJohn Travolta
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Hierarchical Database Model•Example where this model works well:
•plant and animal taxonomies •Soil classification
•Works when: classes are totally mutually exclusive•Problem with this model:
•Does not work when have entities that belong to several classes or do not have mutual exclusivity•Think about the problems with Windows Explorer•Example: classifying your music collection
•You may create classes like rock, jazz, classical, Latin, with folders for artists nested within•However, an artist may do rock and Latin and jazz on the same album, or one song may be a combination
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Networked Database ModelA database design for storing information by linking all
records that are related with a list of “pointers.” Problem: linkages in the tables must be known before. Not adaptable to change.
Action shows Drama Sitcoms
Dukes of Hazzard CHIPs
Dallas Fantasy Island
Love Boat Three’s company
ABCCBSNBC
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational (Tabular) Database ModelA design used in database systems in which relationships are created between one or more flat files or tables based on the ideathat each pair of tables has a field in common, or “key”. In a relational database, the records are generally different in each table
The advantages: each table can be prepared and maintained separately, tables can remain separate until a query requires connecting, or relating them, relationships can be one to one, one to many or many to one
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Name Phone Address Student ID
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
Records are the unit that the data are specific to
Fields, or columns, are attribute categories
Cells are where individual values of a record for a field are stored
records
fields
cells
Headings: are the labels for the columns
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Course name
Course number
enrollment faculty ID
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
Is a field that is common to two or more flat files; allows a query to be done across multiple tables or allows two tables to be joined
Name Phone Address faculty ID
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
*** *** *** ***
Flat file: professor info Flat file: course info
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Join TablesBased on the values of a field that can be found in both tablesThe name of the field does not have to be the same
The data type has to be the same
Key A B
1
2
3
Key C
1
2
1
2
3
4
5
6
10
20Key A B
1
2
3
1
2
3
4
5
6
C
10
10
50JOIN
In this case we have a one to one join; here the key is unique
3 50
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Join Tables
Key A B
1
1
2
Key C
1
2
1
2
3
4
5
6
10
20Key A B
1
1
2
1
2
3
4
5
6
C
10
10
20JOIN
In this case we have a one to many join; here the key is not unique
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational (Tabular) Database Model: 70s TV exampleNow we can have various flat files (tables) with different record types and with various attributes specific to each record
Actor Year born*
Sideburn length
Show
John Travolta
1948 slight WBK
Eric Estrada 1949 moderate CHIPS
Larry Wilcox
1953 slight CHIPS
Tom Wopat 1950 major Dukes
Show Lead actor
Co-star Network*
Welcome back Kotter
John Travolta
Gabe Kaplan CBS
CHIPs Eric Estrada
Larry Wilcox CBS
Dukes Tom Wopat
John Schneider
NBC
*entirely guessed at- I am not responsible for mistaken TV trivia
Table 1- specific to actors
Table 2- specific to shows
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational (Tabular) Database ModelThis allows queries that go across tables, like which CBS lead actors were born before 1951? Answer: John Travolta and Larry Wilcox
*entirely guessed at- I am not responsible for mistaken TV trivia
It does this by combining information from the two tables, using common key fields
Actor Year born*
Sideburn length
Show
John Travolta
1948 slight WBK
Eric Estrada 1949 moderate CHIPS
Larry Wilcox
1953 slight CHIPS
Tom Wopat 1950 major Dukes
Show Lead actor
Co-star Network*
Welcome back Kotter
John Travolta
Gabe Kaplan CBS
CHIPs Eric Estrada
Larry Wilcox CBS
Dukes Tom Wopat
John Schneider
NBC
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational (Tabular) Database ModelObject-relational databases can contain other objects as well, like images, video clips, executable files, sounds, links
Actor Year born*
Sideburn length
Picture
John Travolta
1948 slight
Eric Estrada 1949 moderate
Larry Wilcox
1953 slight
Tom Wopat 1950 major
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational Database: another example: property lot info
One-to-one relationship
Parcel ID
Street address
zoning
11 15 Maple St.
Residential-1
12 85 Brooks Ave
Commercial-2
13 74 Windam Ct.
Residential 4Owner Parcel ID occupation
J. Smith
13 lawyer
R. Jones
11 dentist
T. Flores
12 Real estate developer
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
One-to-many relationship
Parcel ID
Street address
zoning
11 15 Maple St.
Residential-1
12 85 Brooks Ave
Commercial-2
13 74 Windam Ct.
Residential 4
Owner Parcel ID occupation
J. Smith 13 lawyer
R. Jones 11 dentist
J. McCann
12 financier
T. Flores
12 Real estate developer
In this case, several people co-own the same lot, so no longer one lot, one person
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Assuming each owner owned several parcels, we would structure the database differently
One-to-many relationshipParcel
IDStreet address
zoning
11 15 Maple St. Residential-1
12 85 Brooks Ave Commercial-2
13 74 Windam Ct. Residential 4
Owner occupation # properties owned
J. Smith lawyer 2
R. Jones dentist 5
J. McCann financier 2
T. Flores Real estate developer
3
Properties owned by T. Flores
Owner Parcel ID
Date of transaction
Flores 13 4-15-00
Flores 15 4-17-01
Flores 19 3-12-99
Note: this table includes data pertinent only to Flores’ ownership of these properties
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
ExampleHere’s an example of a chart showing the relationships between flat files in a sample relational database for food suppliers* in Microsoft Access
* This comes from an MS ACCESS sample database
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted* This comes from an MS ACCESS sample database
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted* This comes from an MS ACCESS sample database
A real time RDBMS allows for realtime linking and embedding of tables based on common fields
Here we see all the orders for product ID 3; there is no need to include product ID in that sub-table
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Part 2. Queries
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries•This is how we ask questions of the data•To ask queries, we use mathematical operators, like =, >, < •To ask queries on multiple criteria, we use logical operators, like AND and OR•Queries can simply select records or perform more advanced operations with those selections, such as make new tables, or summarize values by averages
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries in Arc GIS•Arc GIS queries only select (highlight) records•When a record is selected, so is its corresponding feature•To summarize selected values, use the “statistics” function or “summarize” tool•To create new values based on a query, use the “calculate” tool.
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
QueriesHere’s an example of a simple query in Arc GIS
PRICE > 250000. This highlights all records (houses) in the specified layer with a sales price greater that $250,000
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
QueriesThat results in the following selection on the map
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
QueriesAnd it also selects the corresponding records in the attribute table
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
QueriesHere’s an example with a polygon layer; I’m querying for census tracts over 8000 people in population.
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: multiple criteriaNow let’s add a criteria; let’s say we’re looking for big population tracts (>8,000) with a high rate of population change (> 3% annual). Note the use of the AND operator. Note also that a subset of the last selection was selected
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: Select From Set vs. New SetWe did the previous selection by clicking Using the “create a new selection” method.
We could have done the same thing by doing the first query (pop>8000), clicking “Apply,” then, without clearing that selection, typing in a new query for the second condition (popchng97 > 3) and choosing the “Select From Selection” method instead
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Three query methods in Arc GIS
New Selection: Creates a new query from scratchAdd to Current Selection: Used when there is already a group of records/features selection; it is equivalent to the OR operator and widens the selection by introducing a criterion that is equivalent to the first oneSelect from Current Selection: Used when there is already a of records/features selection; selects a subset from the originally selected set; equivalent to the AND operator
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: OR operatorHere’s a query where we use the OR operator to select either tracts greater than 8000 population OR with a growth rate greater than 3%; results in many more records selected; can also do the same thing by doing one query using “new selection” then another, using “add to current selection”
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: StringsQueries can also be made on text strings, but it is imperative to put the values in quotes. Here we query for both BLM and Parks and Rec land.
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: Strings and numbersString and number queries can be combined. For example, let’s say we’re looking for land for a suburban park and our criteria are that we need areas whose land use is classed as agricultural and that are bigger than 500,000 square feet.
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: Strings and numbersResults in:
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Queries: Strings and numbersWhereas if our query asks for agricultural land use without the area criterion, we get:
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
So what can Arc GIS do with queries?
A query selects records; once selected you can:
• Look at the selection
• Requery the selection
• Do stats on the selection
• Create new fields that recategorize the selection by an an attribute field
• Create new fields by doing calculations across several fields
• Create a shapefile from the selection
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Examples
Let’s query high unemployment census tracts in LA
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now let’s do “statistics” to determine the population in those areas. Answer: almost 5 million people live in tracts with 6%+ unemployment (see Sum). We can also see that there are 844 tracts meeting that description (see Count)
Right click on the heading to get this menu
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Another thing we can do is convert the selection to a either a new shapefile or geodatabase feature class
Right click and then click Data>>export data
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now, let’s say we wanted to prioritize inner city areas for urban redevelopment projects:
• Let’s query based on unemployment and home value
• Based on these we’ll create a new field that classes all tracts into High, Medium and Low priority areas
•Tracts with median home value < $100,000 and un-employment > 12% are “High”
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
To reclassify, we create a new field, “priority”, activate the field heading and use the field calculator to set all selected records to “high”
Note: we must uses quotes with a text field
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now we would set criteria for “medium” and “low” based on unemployment and home value. These would probably be more complex queries because we’re querying for records, say, between 8 and 12% unemployment and between $100,000 and $150,000 median value.
Note: AND is used three times, with two parenthetical clauses
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now, for the third class our task is easier—we just select everything that has not been selected yet. To do this we query for “priority”= ‘’ where those two marks after the equals sign are single quote marks. By putting empty quote marks, you’re querying for records with no values in them for that field. Now you’d set all those fields equal to “low.”
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now we can make a category map showing us that classification based, which is based on two attributes—median value and unemployment
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Another example:
This time, let’s take a vegetation layer and query for stands with crown fire potential; because there are several classes we have to query for all
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Then let’s calculate a fire hazard index for selected polygons equal to .5(rate of spread * flame length)
We’ll create a new field, “fireindex” (floating point) and set all selected polygons equal to that calculation
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Then, for all other polygons without crown fire potential, a different equation can be used, say .38(Rate of spread * Flame length). But first we have to take the inverse of the selection by using the “switch selection” function
Then we can do the new calculation on the new selection
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Now we can plot out the map of fire index, plotted out using graduated color (quantity) mapping
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Access and Arc GIS queriesYou can do all these queries and much much more in MS Access, which is a relational DBMS.
For the most part, you’ll use Access to manipulate and query your attribute tables from geodatabases
This can be done because a geodatabase is an MS Access file (.MDB)
There are six basic queries you can do in Access:Select, cross-tab, make table, update,append, delete
We’ll learn more about these in lab
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Access Queries•Select: the most general purpose and versatile query—creates a new temporary table; used for getting summary statistics for a field, or breaking down summary statistics by category•Cross-tab: for summarizing statistics across two factors (row and column)•Make table: for creating a new, stand-alone data table from a query•Update query: this is where we fill a field (could be an empty field) in an existing table with new values, either equal to a constant, to values in another field or to an operation using values from another field; can use Where criteria on this•Append/delete queries: query that defines rows to append to or delete from a table; append queries usually require another table.
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Access Queries•Queries can be used to:
•Summarize information stored in one or many tables (e.g. sales by year, sales by category, sales by saleperson, sales by date, orders by date, orders by product type, orders by zip code)•Create new fields using simple or complex expressions, with the option of using criteria to specify which records will be filled in for that field•Derive averages, maxima, minima, sums, standard deviations, and counts for values in fields, with or without criteria•Derive those same things for categories within a field•Summarize and ask questions of attribute data stored in different tables
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Access Queries•Example of query run to get sums of sales values across product categories:
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Relational attribute queriesHere’s a an Access select query; note how it queries across various linked tablesThis one asks for a summary of sales by category and product name for the dates between 1/1/1997 and 12/31/1997
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
Queries can be used to return statistics: here we get the mean price from a database of housing sales
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
And here we summarize mean price by zip code
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Remember the food database?
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
This simple select query yields a summary table of sales by category for a given year period: generates a mean value for each category
criteria
relates
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
This select query perform a math operation: it multiplies price and quantity, times a discount and delivers a table of order subtotals
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
Here we sort sales by product and city
operation
criteria
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
Here we sort sales by city only
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
Queries can also be used to make reports, like this invoice
Fundamentals of GIS
Lecture materials by Austin Troy (c) 2010, except where noted
Advanced Single layer query operations
Queries can be programmed to make custom database interfaces, so users can easily ask questions of the data, like this, where orders are summarized by buyer and the user chooses the country to query on