are geodatabases a suitable long-term archival format? jeff essic, matt sumner north carolina state...
TRANSCRIPT
Are Geodatabases a Suitable Long-Term Archival Format?
Jeff Essic, Matt SumnerNorth Carolina State University Libraries
2009 ESRI International Users Conference
July 14, 2009
2
NC Geospatial Data Archiving Project (NCGDAP)
Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)
Focus on state and local geospatial content in North Carolina (state demonstration)
Website: http://www.lib.ncsu.edu/ncgdap
3
Geospatial Data Preservation Challenge:Vector Data Formats
No widely-supported, open vector formats for geospatial data
Spatial Data Transfer Standard (SDTS) not widely supported
Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”
Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserve
Can export individual data layers for curation, but relationships and other context are lost
4
Challenge: Other Data Types
Cartographic RepresentationSoftware Project Files, PDFs,
GeoPDFs, WMS images
Web 2.0 contentStreet views, Mashups
Oblique Imagery
3D Models
5
Different Ways to Approach Preservation
Technical solutions: How do we preserve acquired content over the long term?
Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?
Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata
6
Question: Frequency of Capture?
Content Exchange – Getting Data in Motion
Repository Development
Repository of Temporal Data Snapshots
7
Repository Development
Downloading or acquiring “low hanging fruit”
Tapping into current data flows
Developing our own metadata when necessary
Converting and preserving vector data in shapefile format
8
Data Preservation Like Fruit Desiccation?
Complex data representations can be made more preservable (yet less useful) through simplification.
Conversion of various formats to shp
Image outputs (web services,
PDF maps, map image files)
Open GeoPDF standard Analogous to paper maps
Combines data, symbology, annotation
More data intelligence than simple
image
PDF content retained in addition to,
NOT instead of data
9
Archival and Long Term Access Working Group
Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus
Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.
Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.
10
Archival and Long Term Access Working Group
Final Report approved by NC GICC in November, 2008
Best Practices for: Archiving Schedule
Inventory
Storage Medium
Formats
Naming
http://www.ncgicc.org/
Wake County adopted, providing archived data onlinehttp://www.wakegov.com/gis/download_data.htm
Metadata
Distribution
Periodic Review
Data Integrity
Publicity
11
NDIIPP Multi-State Geospatial Project
Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC
Partners:Leading state geospatial organizations of Kentucky and Utah
State Archives of Kentucky and Utah
NCSU Libraries in catalytic/advisory role
State-to-state and geo-to-Archives collaboration
Archives as part of statewide Spatial Data Infrastructure
Geodatabase Curation Study: Overview
Three types of Geodatases: Personal, File, SDE
Curation/Conversion options:
Archive GDB object
Export to: XML, shapefiles, GML Simple Features (open published formats)
Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?
Caveats
Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested.
Some conversions required running overnight. Limited time for testing multiple datasets and scenarios.
Didn’t explore GDB’s with rasters.
Very limited geodatabase experience or expertise.
Personal Geodatabase
Not ideal archival object
Very proprietary – ArcGIS / MS Access formats
ESRI now recommends using File GDB insteadhttp://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases
Archive export formats: XML, shapefiles
File Geodatabase
Potential archival object
Kentucky KYGEONETESRI working on low-level (non ArcObjects based) API (http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ and http://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E2713-1422-2418-
7F20BB7C186B5B83&questionId=2578)
Folder/File structure
Can see “under the hood”
Requires knowledge of component parts
Archive export formats: XML, shapefiles, GML
File Geodatabase
KYGEONET:
“Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.”
http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf
SDE Geodatabase
Stored in RDBMS, so can’t be archived as a stand-alone object unless exported
Supports Historical Archiving
Commonly used among local govts. for enterprise data management
Archive export format: XML, fGDB, shapefiles
Questions for Testing
Will pGDB XML export files round-trip between 9.1 and 9.3.1?
Will fGDB XML export files round-trip between 9.2 and 9.3.1?
Will fGDB GML round-trip within 9.3.1?
Do GDB’s have added value that is not represented in shapefile exports?
Personal GDB TestsRichmond VA pGDB – Version 8.3 – Created October 3, 2003
Initial Size Compressed Size Ratio
Original pGDB 728 MB 309 MB 1:2.36
Export to XMLusing 9.1 / Binary
Success 2.8 GB (4X > than source) 269 MB 1:10.7
XML Import to pGDB using 9.1
Success 736 MBAttribute text for Sub-Domains and Relationships Preserved
XML Import to pGDBusing 9.2
FAILED(size reached 394 MB)
XML Import to pGDBusing 9.3.1
FAILED(size reached 788 MB)
pGDB Export to Shapefilesusing 9.3.1
Success 523 MB / 448 FilesAttribute text for Sub-Domains and Relationship Classes are lost; Codes and IDs retained
pGDB Upgrade to 9.3.1
Richmond VA pGDB – Version 8.3 – Created October 3, 2003
Initial Size Compressed Size Ratio
Original pGDB 728 MB 309 MB 1:2.36
Upgraded to 9.3.1 pGDB
Success 728 MBNote: Upgrade using “Properties/Upgrade Geodatabase”
Export to XML Success 1.25 GB
XML Import to pGDBusing 9.3.1
Success 738 MBFunctionality and content intact
pGDB conversion to fGDB
Richmond VA pGDB – Version 8.3 – Created October 3, 2003
Initial Size Compressed Size Ratio
Original pGDB 728 MB 309 MB 1:2.36
Import to 9.3.1 fGDB Success 274 MB / 322 Filessub-domain attributes preserved; relationship classes were lost
File GDB Tests
Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009
Initial Size Compressed Size Ratio
Original fGDB 224 MB / 64 files 80.9 MB 1:2.77
Export to XMLusing 9.2 / Binary
Success 1.11 GB (5X > than source) 137 MB 1:8.3
XML Import to fGDBusing 9.3.1
Success 223 MB / 61 Files
fGDB Export to shapefilesusing 9.3.1
Success 427 MB / 63 FilesNo sub-domain attributes or relationship classes to test, but it’s documented that significant fGDB functionality and tabular data may be lost.
GML Export
GML “Simple Features Profile” now supported by 9.3
ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users
File GDB/GML Test
Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009
Initial Size Compressed Size Ratio
Original fGDB 224 MB / 64 Files 80.9 MB 1:2.77
Export to GMLusing 9.3.1 456 MB 60.1 MB 1:7.59
GML Import to fGDBusing 9.3.1
FAILED(reached 111 MB / 46 Files)
Conclusions
For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB.
Stand alone fGDB may be safe archival format, following KYGEONET’s lead.
Risk: format newness & unknown future
Will feel safer after ESRI release of API.
Future Study Needs
Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved?
SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles?
What’s the problem with the GML import?
31
http://www.lib.ncsu.edu/ncgdap/presentations.html
Jeff Essic, Matt SumnerData Services Librarians
NCSU [email protected], [email protected]
Slide Presentation