national geospatial digital archive greg janée. greg janée may 31, 20052 outline two preservation...

21
National Geospatial Digital Archive Greg Janée

Upload: perry-banker

Post on 16-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

National Geospatial Digital Archive

Greg Janée

Greg Janée • May 31, 2005 2

Outline

• Two preservation misadventures• Digital preservation problems• Genesis of NGDA• Project approach/philosophy• What it will mean to be a data provider

Greg Janée • May 31, 2005 3

Domesday Book— 1086

Greg Janée • May 31, 2005 4

Domesday Book— 1986

Greg Janée • May 31, 2005 5

Greg Janée • May 31, 2005 6

Meanwhile, back at NASA...

• 1976– Viking probes go to Mars

• 1999– USC neurobiologist Joseph Miller asks for

data– tapes coded “in a format so old that the

programmers who knew it had died”– works off of paper records

Greg Janée • May 31, 2005 7

Preservation issues

• Physical– media– systems

• Contextual– format– semantics– authenticity

• Legal– copyright

Greg Janée • May 31, 2005 8

Project genesis

• NDIIPP– Library of Congress, 2000– $100M– http://www.digitalpreservation.gov/

• NGDA– UCSB (MIL) & Stanford (Branner Library)– $2.6M, 3 years– archive geospatial data on a national scale– http://www.ngda.org/

Greg Janée • May 31, 2005 9

Greg Janée • May 31, 2005 10

Some philosophy

• Archival has to be cheap & easy– must be distributed– but reality is little incentive, no funding

• Archive definition:– offers access now & in the future– no mandatory services beyond simple access

• Policy separated from mechanism

• Archive includes data semantics– key differentiator from text, audio, video

Greg Janée • May 31, 2005 11

Philosophy, cont.

• Curatorial, not archeological approach– assumption: content comes in discrete, self-

contained chunks

• Preservation by format definition, archival & association– support for derivative forms, services

• Must support long-term preservation– need to migrate archive itself

Greg Janée • May 31, 2005 12

MIT Media LabStewart Brand, “How Buildings Learn,” p. 53

Greg Janée • May 31, 2005 13

MIT Building 20Ibid., p. 26

Greg Janée • May 31, 2005 14

system

databasestorage

handleresolver

database

Typical repository architecture

database

handleresolver

database

fragile

Greg Janée • May 31, 2005 15

NGDA architecture

storage subsystem

standard, public data model

archival system

databases,caches,

etc.

bulkloader

ingest

ADL OAIWeb

access

Greg Janée • May 31, 2005 16

Post-NGDA architecture

storage subsystem

standard, public data model

Web

Greg Janée • May 31, 2005 17

Storage system requirements

• Req’s:– associate UUIDs/RIDs with bitstreams– retrieve global/local bitstream by UUID/RID– determine (parent) UUID of any bitstream– list all UUIDs

• Satisfied by:– any filesystem– tag URIs for UUIDs

• tag:library.ucsb.edu,2005:identifier

Greg Janée • May 31, 2005 18

Archival objects

directoryUUID

componentRID

UUID

Greg Janée • May 31, 2005 19

Example

USGS

DOQQ

GeoTIFFFGDC

Object x

x.tiffx.fgdc x.gif

met

adat

ad

ata

derived

TIFFsubtypeOf

Greg Janée • May 31, 2005 20

Object types

• Data, other content• Format definitions• Semantic definitions• Providers• Organizational structures

– collection– series– ingest session

Greg Janée • May 31, 2005 21

Archive-provider agreement

• Defines– common structure of objects to be ingested– necessary validations– associations to other objects

• assumes pre-loading of semantic definitions

– policies, rights, etc.

• Represents choke point– requires human evaluation