catherine masi, national geospatial digital archive may 16, 2005 ngda format registry why do we...
TRANSCRIPT
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
NGDA Format Registry
Why do we need a FR? We are designing with long-term storage in mind (>
100 years) Cannot depend on format spec to be available via
url or even a format registry that might not still be up to date or in existence
Thus semantic definition of format must be archived with the object itself
This semantic definition must be comprehensive so that format can be accessed even if current access mechanisms no longer exist!
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
NGDA Format Registry
Two major tasks
Analyze and define spatial data formats (Meredith Williams)
Develop local format registry with programmatic interface to existing authoritative/collaborative FR (Catherine Masi)
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Analyze and define spatial data formats
Is there a comprehensive list of geospatial formats? Are they defined? How? List of Spatial Data Formats - MW
Digital Map Formats Vector File Formats Raster File Formats Other categories - TIN, ASCII, 3D, Tabular
Databases Unacceptable Formats
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Analyze and define spatial data formats
What formats do we have in ADL? How do we define them? ADL format documentation
ADL website: http://www.alexandria.ucsb.edu/adl/Collection%20Development/BucketDescrip.htm
MIME types: http://www.iana.org/assignments/media-types/ ADL literature/presentations:
Format type: hierarchical vocabulary: ADL Object Format Thesaurus
loosely based on MIME multiple values: union compare: DC.Format
ADL Webclient list: http://webclient.alexandria.ucsb.edu/mw/index.jsp
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Analyze and define spatial data formats
What are our preferred formats for NGDA, if any? MW tested three geospatial formats using
Sustainability Test derived from LCDF GJ - "we can ingest anything if we have the
definition representation information" Decided to limit allowed formats to a few the first
year – CASIL test suite (geotiff, shapefile) What if there is free proprietary software, such as
from ESRI, that allows one to look the files. Should we request and archive that as well? - No (UCSB)
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Analyze and define spatial data formats
How will we define our formats? Using Meredith's list of Spatial Data Formats Begin defining using LoC Digital Formats as an
example How do we know that we have sufficient semantic
information to define each geospatial format? What information is required to make the format
usable? Ask the users. What information is required to programmatically
access the format if current access mechanisms become obsolete?
Prioritize and start with most important/ubiquitous formats for our archive
Cooordinate with format definitions in Jhove
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
What format registries are out there? Library of Congress Digital Formats (LCDF) Global Digital Format Registry (GDFR) -
Harvard Global Digital Format Registry Description Ockerbloom's Format Registry Demonstrator
(FRED) PRONOM - File format registry - UK archives
Practical, in use, not geo-spatial
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
Coordinate our efforts with the LCDF, GDFR, FRED, TOM NH initiated contact (Stephen Abrams, John
Ockerbloom, Steve Morris, etc.) at DLF Questions for DFL meeting to get discussion
started. Questions that we formulated showed that we
have to solve a lot of these problems on our own, especially with regard to the technical aspects of building a FR and interaction mechanisms between LC, GDFR and our local FR
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
Do the existing format registries contain geospatial formats? No, in the future we will contribute
geospatial formats to an existing registry effort such as LCDF or GDFR
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
Do the existing format registries support access and contribution mechanisms? No.
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
How are Library of Congress Digital Formats stored internally? Database? XML? Directory structure? In MS Word files
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
Is there a data dictionary or other mechanism for defining fields in LCDF? FDD
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
CM contacted Steve Morris (NCSU - NDIIPP), Stephen Abrams (Harvard - GDFR) and John Mark Ockerbloom (Penn - FRED), to open up a discussion on the technical aspects of developing a geospatial format registry.
S. Abrams responded that GDFR is still only an idea rather than a reality and that a technical discussion of how our GIS formats should be managed in a GDFR-conformant way is a bit premature
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Develop local format registry with programmatic interface to existing authoritative/collaborative FR
What are the requirements for the NGDA Format Registry? independent contains sufficient semantic information to
programmatically access format (UCSB) contains geospatial reference information definitions exist in simple documented format
in simple directory structure access/search mechanism not necessary for
access interfaces with collaborative authoritative FR
for updates and contributions
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
First steps:
CM began prototyping the physical structure of format registry using 2 CASIL formats, geotiff and shapefile.
Created directory based registry. Incorporated info from MW's documents Spatial Data Formats
and Sustainability Test Created record layout loosely based on Library of Congress
Digital Formats but including spatial reference information. Included format spec as local website (in the case of geotiff)
and as local pdf file (in the case of shapefile). All links on record referred to local copies of format
information. All documentation about the format is located locally in that
format's directory Entries are not complete. This is just a first pass at what the
html-rendered format entries will look like. Focus here is on physical structure rather than content.
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
First steps:
Refining content using input from DV, MW and from actual data users as to what is needed to adequately define a format. Determine sufficient semantic info to define
geospatial formats Review CASIL formats. Began to flesh out
sufficient semantic info. Started with geotiff, shapefile.
Review record layout and add, change and delete fields.
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Next steps
Make sure format spec is complete and all information is located locally where possible.
Determine where we draw the line between format registry information/policy/higher level descriptive metadata. Format registry will stick to format spec and a few other important fields only.
Develop xml stylesheet of record layout. Decided that html, xml and pdf are acceptable archivable formats for format registry information.
Flatten the directory structure (hierarchy) because tfw, for example, is not a subtype of geotiff but can be attached to a tiff or another format. Work more on trying to find a sensible organization for the files in our FR
Link to other parts of Archive (Descriptive Metadata) from within FR
Catherine Masi, National Geospatial Digital Archive
May 16, 2005
Later
Develop method of search, retrieval, update
Begin to develop programmatic interface to LoC Digital Formats or other authoritative/collaborative format registry