Putting time into the GeoWeb:
Data persistence in a web services environment
Steve Morris
NCSU Libraries
July 23, 2008
Overview
• Background to the digital preservation problem
• Problems– Temporal data access issues– Capturing data state in a services or API
context– Making the business case for older data
• Preservation approaches• Future directions
Project background: North Carolina Geospatial Data
Archiving Project• Partnership between university library (NCSU) and
state agency (NCCGIA)• Under cooperative agreement with Library of
Congress in NDIIPP national preservation program• Focus on state and local geospatial content in North
Carolina (state demonstration)• Tied to NC OneMap initiative, which provides for
seamless access to data, metadata, and inventories• Goal: Engage spatial data infrastructure (SDI) in data
preservation and archiving
Demonstration repository as catalyst for an industry conversation
SDI role in data preservation
• Data inventories support content identification• Metadata standards support discoverability
and use• Content standards support data
interoperability over time and help eliminate semantic confusion
• Data exchange networks:– Minimize need to make contact– Add technical, administrative, descriptive
metadata– Establish rights and provenance
Project roots: NCSU Libraries data directories
Tracking data, map servers, and web services since 2000
Ranked 3rd in traffic among entry points to entire library website
Persistent identifiers– usage tracking– ID links used in other sites
Community help in site maintenance
0
10
20
30
40
50
60
70
80
90
100
2000 2001 2002 2003 2004 2005 2006 2007 2008
Nu
mb
er
of
Co
un
tie
s
Map Server
Data Download
WMS
County map and data services in NC
100 Counties in North Carolina
Carrboro, NC : Population 17,797 (2005 est.)
24 downloadable GIS data layers
4 WMS data layers
6 web mapping applications
9 downloadable PDF map layers
Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol
1914 Sanborn Map
Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol
1993 DOQQ
Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol
1999 Wake County Ortho
Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Note: Percentages based on the actual number of respondents to each question
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly proprietary formats
Downtown Raleigh Near State Capitol
2005 Wake County Ortho
Imagery = DurableStatic Simple structureMostly open formats
Vector data = VolatileFrequent updateComplex structureMostly commercial formats
Data preservation points of failure
• Data is not saved, or …• can’t be found, or …• media is obsolete, or …• media is corrupt, or …• format is obsolete, or …• file is corrupt, or …
• meaning is lost Solutions:
MigrationEmulationEncapsulation XML
Problem: Data state in a web services or API-driven environment
• xxxxxxxxxxxxxxxxxx
• How to capture records from decision- making processes?• How to capture data state as well as service state?
Problem: Temporal data unavailability
• Industry focus on “latest and greatest” data• “Kill and fill” as a common approach to data
management (past versions of vector data lost)
Not just data loss, also: Loss of memory about data • Of superceded county orthophoto flights in NC only
22% recorded in the state’s GIS inventory
Some older inventories only available through Internet Archive
Availability of older orthoimagery on county map servers in NC
0
2
4
6
8
10
12
14
SupercededCounty
Orthophoto Collections
1992 1994 1996 1998 2000 2002 2004
Orthophoto Flight Year
Online
Offline
Only 30% of superceded digital ortho flights accessible through county map servers
Availability of older orthoimagery on county map servers in NC
0
2
4
6
8
10
12
14
SupercededCounty
Orthophoto Collections
1992 1994 1996 1998 2000 2002 2004
Orthophoto Flight Year
Online
Offline
23 Counties in NC publish ortho WMS services 0 Counties in NC publish superceded orthos as WMS services
Problem: Making business case for archiving
Use case: Land use and impervious surface change analysis
1993
2005
1998
2002
1999
Building the preservation business case
• Land use change analysis• Site location analysis• Real estate trends analysis• Disaster response• Resolution of legal challenges• Impervious surface change mapping
Planned 2008 NC business case survey
• Case description• Resources/Scope of effort• Benefits and results• Fiscal assessment
Based on previous experience, pending projects, examples of when a project could have been served better if archival data were available
Geospatial data preservation challenges
• Producer focus on current data• Future support of data formats in question• Inadequate or nonexistent metadata• Spatial databases• Complex data objects (multi-file, multi-format)• Shift to web services-based access
(ephemeral data)• Difficult to capture data state at point of
decision-making
Preservation approaches: Temporal data snapshots
Issue: How frequently should county and municipal vector data layers be captured in archives?
Parcels, centerlines, jurisdictions, zoning, …
Parcel Boundary Changes 2001-2004, North Raleigh, NC
NC frequency of data capture surveys
• How often should continually changing vector datasets be captured?
• Tap into data custodian understanding of production patterns and uses
• Tap into local innovation• Learn about local business drivers for data archiving
– 2006 and 2008 surveys of NC cities and counties– 2008 survey of archival practice in state agencies
in NC– Planned survey of data users in NC
http://www.nconemap.com/AboutNCOneMap/tabid/289/Default.aspx#preservation
Preservation approaches: Dessicated data
Complex data representations can be made more preservable (and less useful) through simplification
Preservation approaches: Dessicated data
• Complex documents may be very hard to preserve over time– GIS project files – Layer definitions – Web services or API interactions
• Image outputs capture some sense of final product--but lose underlying data intelligence
Note: Percentages based on the actual number of respondents to each question
Cartographic outputs – analogous to the old paper maps
Combined datasets, with data models, classification, symbolization, annotation
More data intelligence than in images
Dessicated data: PDF and GeoPDF
Dessicated data: Geospatial PDF
• Explosion of geospatial PDF content in past few years
• Standards issues– GeoPDF: proprietary TerraGo technology– PDF an open ISO standard– Open PDF variants created through ISO
standards process (PDF/E, PDF/X, PDF/A, …)
• PDF content retained in addition to, NOT instead of data
Preservation approaches: Historical WMS tile caches?
No market for archived tiles without standard way to describe tiles and without commonly used tiling schemes
Preservation approaches:Historical WMS tile caches?
• Tile cache systems developed for more responsive WMS or mapping systems– WMS Tile Caching (WMS-C) incubated by
OSGEO – WMTS (Web Map Tiling) OGC white paper
• No explicit temporal component in WMS-C or WMT
To what extent do temporal geospatial systems become video-like?
• Use Sanborn map slide or replacement
Pronounced local agency interestin archiving, digitizing, and geo-referencing older analog products
Old maps coming into the GeoWeb …
New archiving interest: Location-based content
Present-day value in location-based services and mobile applications
Street ViewsOblique Imagery
3D Images
Future value of non-spatial place-based imagery as cultural heritage resource
More descriptive of place and function than spatial imagery
New archiving interest: Location-based content
Moving forward
• GICC Archival and Long-Term Access Committee
• Geo Multistate Archival and Preservation Partnership (GeoMAPP)
• OGC Data Preservation Working Group
Community response to data archiving challenge
• Nov. 2007: NC Geographic Information Coordinating Council (GICC):
Ten Recommendations in Support of Geospatial Data Sharing released– Recommendation: “Establish archive and long
term data access strategies”– Suggested best practices include: “Establish a
policy and procedure for the provision of access to historic data, especially for framework data layers.”
GICC Archival and Long-Term Access Committee
• Initiated in response to agency requests for guidance on temporal data management
• Federal, state, regional, and local agency representation
• Key focus– Best practices for data snapshots and retention– State Archives processes: appraisal, selection,
retention schedules, etc.– Who, What, Why, When, Where, How
Geo Multistate Archive and Preservation Partnership (GeoMAPP)
• Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA), State Archives of NC, with Library of Congress
• Partners:– State geospatial organizations of Kentucky and Utah– State Archives of Kentucky and Utah– NCSU Libraries in catalytic/advisory role
• State-to-state and geo-to-Archives collaboration• 2 year project: Nov. 2007-Dec. 2009• Archives as part of Spatial Data Infrastructure
OGC Data Preservation Working Group
• Formed Dec. 2006• Engage archival community• Find points of intersection with other OGC
activities:– GML for archiving– Content packaging– Large scale data transfers– Time in decision support
The Content Packaging Problem
XML DatabaseExport
XML DatabaseExport
TIFF Images •Pixel Value and Header file•World file•Coordinate System file•Metadata file
Shapefiles•Geometry file•Index file•Attribute file•Metadata file•Coordinate System file•Spatial Index files
Potential Ingest Objects
Files
• Multi-file dataset• Georeferencing• Metadata file• Symbols file• Additional documentation• License• Disclaimer• More
Metadata
• ISO/FGDC• Acquisition metadata• Transfer metadata • Ingest metadata• Archive rights• Archive processes• Collection metadata• Series metadata
Metadata Exchange Format (MEF) in GeoNetwork a form of content packaging
Questions?
Contact:
Steve MorrisHead, Digital Library InitiativesNCSU [email protected]
NCGDAP site: http://www.lib.ncsu.edu/ncgdap/