catherine masi, national geospatial digital archive may 16, 2005 ngda format registry why do we...

18
Catherine Masi, National Geos patial Digital Archive May 16, 2005 NGDA Format Registry Why do we need a FR? We are designing with long-term storage in mind (> 100 years) Cannot depend on format spec to be available via url or even a format registry that might not still be up to date or in existence Thus semantic definition of format must be archived with the object itself This semantic definition must be comprehensive so that format can be accessed even if current access mechanisms no longer exist!

Upload: hilary-obrien

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

NGDA Format Registry

Why do we need a FR? We are designing with long-term storage in mind (>

100 years) Cannot depend on format spec to be available via

url or even a format registry that might not still be up to date or in existence

Thus semantic definition of format must be archived with the object itself

This semantic definition must be comprehensive so that format can be accessed even if current access mechanisms no longer exist!

Page 2: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

NGDA Format Registry

Two major tasks

Analyze and define spatial data formats (Meredith Williams)

Develop local format registry with programmatic interface to existing authoritative/collaborative FR (Catherine Masi)

Page 3: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Analyze and define spatial data formats

Is there a comprehensive list of geospatial formats? Are they defined? How? List of Spatial Data Formats - MW

Digital Map Formats Vector File Formats Raster File Formats Other categories - TIN, ASCII, 3D, Tabular

Databases Unacceptable Formats

Page 4: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Analyze and define spatial data formats

What formats do we have in ADL? How do we define them? ADL format documentation

ADL website: http://www.alexandria.ucsb.edu/adl/Collection%20Development/BucketDescrip.htm

MIME types: http://www.iana.org/assignments/media-types/ ADL literature/presentations:

Format type: hierarchical vocabulary: ADL Object Format Thesaurus

loosely based on MIME multiple values: union compare: DC.Format

ADL Webclient list: http://webclient.alexandria.ucsb.edu/mw/index.jsp

Page 5: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Analyze and define spatial data formats

What are our preferred formats for NGDA, if any? MW tested three geospatial formats using

Sustainability Test derived from LCDF GJ - "we can ingest anything if we have the

definition representation information" Decided to limit allowed formats to a few the first

year – CASIL test suite (geotiff, shapefile) What if there is free proprietary software, such as

from ESRI, that allows one to look the files. Should we request and archive that as well? - No (UCSB)

Page 6: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Analyze and define spatial data formats

How will we define our formats? Using Meredith's list of Spatial Data Formats Begin defining using LoC Digital Formats as an

example How do we know that we have sufficient semantic

information to define each geospatial format? What information is required to make the format

usable? Ask the users. What information is required to programmatically

access the format if current access mechanisms become obsolete?

Prioritize and start with most important/ubiquitous formats for our archive

Cooordinate with format definitions in Jhove

Page 7: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

What format registries are out there? Library of Congress Digital Formats (LCDF) Global Digital Format Registry (GDFR) -

Harvard Global Digital Format Registry Description Ockerbloom's Format Registry Demonstrator

(FRED) PRONOM - File format registry - UK archives

Practical, in use, not geo-spatial

Page 8: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

Coordinate our efforts with the LCDF, GDFR, FRED, TOM NH initiated contact (Stephen Abrams, John

Ockerbloom, Steve Morris, etc.) at DLF Questions for DFL meeting to get discussion

started. Questions that we formulated showed that we

have to solve a lot of these problems on our own, especially with regard to the technical aspects of building a FR and interaction mechanisms between LC, GDFR and our local FR

Page 9: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

Do the existing format registries contain geospatial formats? No, in the future we will contribute

geospatial formats to an existing registry effort such as LCDF or GDFR

Page 10: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

Do the existing format registries support access and contribution mechanisms? No.

Page 11: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

How are Library of Congress Digital Formats stored internally? Database? XML? Directory structure? In MS Word files

Page 12: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

Is there a data dictionary or other mechanism for defining fields in LCDF? FDD

Page 13: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

CM contacted Steve Morris (NCSU - NDIIPP), Stephen Abrams (Harvard - GDFR) and John Mark Ockerbloom (Penn - FRED), to open up a discussion on the technical aspects of developing a geospatial format registry.

S. Abrams responded that GDFR is still only an idea rather than a reality and that a technical discussion of how our GIS formats should be managed in a GDFR-conformant way is a bit premature

Page 14: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Develop local format registry with programmatic interface to existing authoritative/collaborative FR

What are the requirements for the NGDA Format Registry? independent contains sufficient semantic information to

programmatically access format (UCSB) contains geospatial reference information definitions exist in simple documented format

in simple directory structure access/search mechanism not necessary for

access interfaces with collaborative authoritative FR

for updates and contributions

Page 15: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

First steps:

CM began prototyping the physical structure of format registry using 2 CASIL formats, geotiff and shapefile.

Created directory based registry. Incorporated info from MW's documents Spatial Data Formats

and Sustainability Test Created record layout loosely based on Library of Congress

Digital Formats but including spatial reference information. Included format spec as local website (in the case of geotiff)

and as local pdf file (in the case of shapefile). All links on record referred to local copies of format

information. All documentation about the format is located locally in that

format's directory Entries are not complete. This is just a first pass at what the

html-rendered format entries will look like. Focus here is on physical structure rather than content.

Page 16: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

First steps:

Refining content using input from DV, MW and from actual data users as to what is needed to adequately define a format. Determine sufficient semantic info to define

geospatial formats Review CASIL formats. Began to flesh out

sufficient semantic info. Started with geotiff, shapefile.

Review record layout and add, change and delete fields.

Page 17: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Next steps

Make sure format spec is complete and all information is located locally where possible.

Determine where we draw the line between format registry information/policy/higher level descriptive metadata. Format registry will stick to format spec and a few other important fields only.

Develop xml stylesheet of record layout. Decided that html, xml and pdf are acceptable archivable formats for format registry information.

Flatten the directory structure (hierarchy) because tfw, for example, is not a subtype of geotiff but can be attached to a tiff or another format. Work more on trying to find a sensible organization for the files in our FR

Link to other parts of Archive (Descriptive Metadata) from within FR

Page 18: Catherine Masi, National Geospatial Digital Archive May 16, 2005 NGDA Format Registry  Why do we need a FR? We are designing with long-term storage in

Catherine Masi, National Geospatial Digital Archive

May 16, 2005

Later

Develop method of search, retrieval, update

Begin to develop programmatic interface to LoC Digital Formats or other authoritative/collaborative format registry