are geodatabases a suitable long-term archival format? jeff essic, matt sumner north carolina state...

31
Are Geodatabases a Suitable Long-Term Archival Format? Jeff Essic, Matt Sumner North Carolina State University Libraries 2009 ESRI International Users Conference July 14, 2009

Upload: albert-cain

Post on 01-Jan-2016

215 views

Category:

Documents


2 download

TRANSCRIPT

Are Geodatabases a Suitable Long-Term Archival Format?

Jeff Essic, Matt SumnerNorth Carolina State University Libraries

2009 ESRI International Users Conference

July 14, 2009

2

NC Geospatial Data Archiving Project (NCGDAP)

Partnership between university library (NCSU) and state agency (NCCGIA), with Library of Congress under the National Digital Information Infrastructure and Preservation Program (NDIIPP)

Focus on state and local geospatial content in North Carolina (state demonstration)

Website: http://www.lib.ncsu.edu/ncgdap

3

Geospatial Data Preservation Challenge:Vector Data Formats

No widely-supported, open vector formats for geospatial data

Spatial Data Transfer Standard (SDTS) not widely supported

Geography Markup Language (GML) – diversity of application schemas and profiles a challenge for “permanent access”

Spatial DatabasesThe whole is more than the sum of the parts, and the whole is very difficult to preserve

Can export individual data layers for curation, but relationships and other context are lost

4

Challenge: Other Data Types

Cartographic RepresentationSoftware Project Files, PDFs,

GeoPDFs, WMS images

Web 2.0 contentStreet views, Mashups

Oblique Imagery

3D Models

5

Different Ways to Approach Preservation

Technical solutions: How do we preserve acquired content over the long term?

Cultural/Organizational solutions: How do we make the data more preservable—and more prone to be preserved—from point of production?

Current use and data sharing requirements – not archiving needs – are most likely to drive improved preservability of content and improvement of metadata

6

Question: Frequency of Capture?

Content Exchange – Getting Data in Motion

Repository Development

Repository of Temporal Data Snapshots

7

Repository Development

Downloading or acquiring “low hanging fruit”

Tapping into current data flows

Developing our own metadata when necessary

Converting and preserving vector data in shapefile format

8

Data Preservation Like Fruit Desiccation?

Complex data representations can be made more preservable (yet less useful) through simplification.

Conversion of various formats to shp

Image outputs (web services,

PDF maps, map image files)

Open GeoPDF standard Analogous to paper maps

Combines data, symbology, annotation

More data intelligence than simple

image

PDF content retained in addition to,

NOT instead of data

9

Archival and Long Term Access Working Group

Initiated by NC Geographic Information Coordinating Council in 2008 to address growing concerns of state and local agencies about long-term access to dataFederal, state, regional, and local agency representationKey focus

Best practices for data snapshots and retentionState Archives processes: appraisal, selection, retention schedules, etc.

Valuable outcome of NCGDAP – multiple parties and levels discussing data archiving on their own.

10

Archival and Long Term Access Working Group

Final Report approved by NC GICC in November, 2008

Best Practices for: Archiving Schedule

Inventory

Storage Medium

Formats

Naming

http://www.ncgicc.org/

Wake County adopted, providing archived data onlinehttp://www.wakegov.com/gis/download_data.htm

Metadata

Distribution

Periodic Review

Data Integrity

Publicity

11

NDIIPP Multi-State Geospatial Project

Lead organizations: North Carolina Center for Geographic Information & Analysis (NCCGIA) and State Archives of NC

Partners:Leading state geospatial organizations of Kentucky and Utah

State Archives of Kentucky and Utah

NCSU Libraries in catalytic/advisory role

State-to-state and geo-to-Archives collaboration

Archives as part of statewide Spatial Data Infrastructure

Geodatabase Curation Study: Overview

Three types of Geodatases: Personal, File, SDE

Curation/Conversion options:

Archive GDB object

Export to: XML, shapefiles, GML Simple Features (open published formats)

Consideration given to objects and export files created in older ArcGIS versions - Will they be compatible with newer versions?

Caveats

Only tested what appeared to be the most reasonable and logical conversion options. Numerous other possibilities not tested.

Some conversions required running overnight. Limited time for testing multiple datasets and scenarios.

Didn’t explore GDB’s with rasters.

Very limited geodatabase experience or expertise.

Personal Geodatabase

Not ideal archival object

Very proprietary – ArcGIS / MS Access formats

ESRI now recommends using File GDB insteadhttp://webhelp.esri.com/arcgisdesktop/9.3/index.cfm?TopicName=Types_of_geodatabases

Archive export formats: XML, shapefiles

File Geodatabase

Potential archival object

Kentucky KYGEONETESRI working on low-level (non ArcObjects based) API (http://moreati.org.uk/blog/2009/03/01/shapefile-20-manifesto/ and http://events.esri.com/uc/QandA/index.cfm?fuseaction=answer&conferenceId=2A8E2713-1422-2418-

7F20BB7C186B5B83&questionId=2578)

Folder/File structure

Can see “under the hood”

Requires knowledge of component parts

Archive export formats: XML, shapefiles, GML

File Geodatabase

KYGEONET:

“Snapshot File Format – Kentucky has chosen to archive its data in the form of an ESRI’s file-based geodatabase (fGDB). This file-based relational database format will allow the entire archive set to exist within it’s own container with groupings of data based upon the FGDC Metadata model (same as groupings on KYGEONET and GOS). This file format is appropriate for the storage of both raster and vector data and allows for compression. Additionally, the fGDB allows for vector topology, the inclusions of route data, and other advanced relationships that cannot be supported with the old Shapefile format.”

http://www.geomapp.net/docs/ky_geoarchives_procedures.pdf

SDE Geodatabase

Stored in RDBMS, so can’t be archived as a stand-alone object unless exported

Supports Historical Archiving

Commonly used among local govts. for enterprise data management

Archive export format: XML, fGDB, shapefiles

Questions for Testing

Will pGDB XML export files round-trip between 9.1 and 9.3.1?

Will fGDB XML export files round-trip between 9.2 and 9.3.1?

Will fGDB GML round-trip within 9.3.1?

Do GDB’s have added value that is not represented in shapefile exports?

Personal and File GDB Export

Export to XMLExport to shapefiles

Export to XML interface

Personal GDB TestsRichmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Export to XMLusing 9.1 / Binary

Success 2.8 GB (4X > than source) 269 MB 1:10.7

XML Import to pGDB using 9.1

Success 736 MBAttribute text for Sub-Domains and Relationships Preserved

XML Import to pGDBusing 9.2

FAILED(size reached 394 MB)

XML Import to pGDBusing 9.3.1

FAILED(size reached 788 MB)

pGDB Export to Shapefilesusing 9.3.1

Success 523 MB / 448 FilesAttribute text for Sub-Domains and Relationship Classes are lost; Codes and IDs retained

pGDB Import of 9.1 XML

9.3.1 Failure Message

9.2 Failure Message

Import in progress

pGDB Export to Shapefiles

Sub-domain attribute text is lost in the conversion to shapefile

pGDB Upgrade to 9.3.1

Richmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Upgraded to 9.3.1 pGDB

Success 728 MBNote: Upgrade using “Properties/Upgrade Geodatabase”

Export to XML Success 1.25 GB

XML Import to pGDBusing 9.3.1

Success 738 MBFunctionality and content intact

pGDB conversion to fGDB

Richmond VA pGDB – Version 8.3 – Created October 3, 2003

Initial Size Compressed Size Ratio

Original pGDB 728 MB 309 MB 1:2.36

Import to 9.3.1 fGDB Success 274 MB / 322 Filessub-domain attributes preserved; relationship classes were lost

File GDB Tests

Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009

Initial Size Compressed Size Ratio

Original fGDB 224 MB / 64 files 80.9 MB 1:2.77

Export to XMLusing 9.2 / Binary

Success 1.11 GB (5X > than source) 137 MB 1:8.3

XML Import to fGDBusing 9.3.1

Success 223 MB / 61 Files

fGDB Export to shapefilesusing 9.3.1

Success 427 MB / 63 FilesNo sub-domain attributes or relationship classes to test, but it’s documented that significant fGDB functionality and tabular data may be lost.

GML Export

GML “Simple Features Profile” now supported by 9.3

ArcToolbox/Data Interoperability Tools: GML support available out-of-the-box to all users

File GDB/GML Test

Kentucky Transportation Vectors – Version 9.2 – Acquired 6 June 2009

Initial Size Compressed Size Ratio

Original fGDB 224 MB / 64 Files 80.9 MB 1:2.77

Export to GMLusing 9.3.1 456 MB 60.1 MB 1:7.59

GML Import to fGDBusing 9.3.1

FAILED(reached 111 MB / 46 Files)

Conclusions

For archival, pGDB must be regularly upgraded, exported to shapefiles (including relational tables), and/or imported to a fGDB.

Stand alone fGDB may be safe archival format, following KYGEONET’s lead.

Risk: format newness & unknown future

Will feel safer after ESRI release of API.

Future Study Needs

Round-trip fGDB via XML- Are complex functions, properties, and relationships preserved?

SDE Export Options – Best practices to preserve as much as possible via XML, fGDB, and/or shapefiles?

What’s the problem with the GML import?

31

http://www.lib.ncsu.edu/ncgdap/presentations.html

Jeff Essic, Matt SumnerData Services Librarians

NCSU [email protected], [email protected]

Slide Presentation