long term data and knowledge preservation for the earth

26
PV 2009, ESAC, Spain, 1-3 Dec Long term data and knowledge preservation for the Earth Sciences Archive S. ALBANI (ESA) D. Giaretta (STFC) PV 2009

Upload: others

Post on 06-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long term data and knowledge preservation for the Earth

PV 2009, ESAC, Spain, 1-3 Dec

Long term data and knowledge preservation for the Earth Sciences Archive

S. ALBANI (ESA)D. Giaretta (STFC)

PV 2009

Page 2: Long term data and knowledge preservation for the Earth

Outline

OAIS conformanceInformation ModelMandatory responsibilities

Preservation workflowsKey ComponentsThreats and counters to those threatsTypical scenarios

2

Page 3: Long term data and knowledge preservation for the Earth

ESA and EO introduction• ESA users worldwide have online access to ~5 PB of EO

data– EO data provide global coverage of the Earth– Data volumes are increasing dramatically– Large requirements for accessing historical archives

• This unique dataset has to be preserved!– ESA is promoting a European EO LTDP Strategy– ESA is involved in several international preservation

activities (PARSE.Insight, Alliance…)• ESA has a complex distributed system architecture

– based on the OAIS standard– providing producer oriented services for data archiving,

data retrieval and processing management…

Page 4: Long term data and knowledge preservation for the Earth

4

CASPAR Project

http://www.casparpreserves.eu

EU FP6 Integrated Project

Total spend approx. 16MEuro (8.8 MEuro from EU)

Page 5: Long term data and knowledge preservation for the Earth

OAIS conformanceFrom OAIS:

A conforming OAIS archive implementation shall support the model of information described in 2.2. The OAIS Reference Model does not define or require any particular method of implementation of these concepts.

A conforming OAIS archive shall fulfill the responsibilities listed in 3.1. Subsection 4 provides examples of the mechanisms that may be used to discharge the responsibilities identified in 3.1. These mechanisms are not required for conformance. It is expected that a separate standard, as noted in section 1.5, will be produced on which accreditation and certification processes can be built.

5

Page 6: Long term data and knowledge preservation for the Earth

Preservation workflows

6

Archival

Package

Content

further described by

Package Packaging

derivedfrom

describedby

delimitedby

DataObject

PhysicalObject

DigitalObject

StructureReferenceOther

Interpretedusing

Interpretedusing*

1

11...*

Bit

addsmeaning

to

Provenance Context FixityAccessRights

Page 7: Long term data and knowledge preservation for the Earth

7

Page 8: Long term data and knowledge preservation for the Earth

8

Information Model & Representation Information

The Information Model is keyRecursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY

(this knowledge will change over time and region)

InformationObject

RepresentationInformation

1+

interpretedusing1+Data

Object

interpretedusing

PhysicalObject

DigitalObject

BitSequence

1+

Page 9: Long term data and knowledge preservation for the Earth

USE DATA• Use application to find data in 

Repository• Create DIP with enough RepInfo for the 

user (via DC profile)• Obtain more RepInfo from Registry if 

necessary

Page 10: Long term data and knowledge preservation for the Earth

Key Components

10

DRM

Page 11: Long term data and knowledge preservation for the Earth

Modules and Dependencies:defining the Designated CommunityREADME.txt

TEXT EDITORENGLISH LANGUAGE

WINDOWS XP

FITS FILE

FITS STANDARD

PDF STANDARD

FITSJAVA s/w

JAVA VMPDF s/w

FITS DICTIONARY

DICTIONARYSPECIFICATION

UNICODESPECIFICATION

XMLSPECIFICATION

MULTIMEDIA PERFORMANCE DATA

C3D DirectX MAX/MSP

3D motiondata files

3D scenedata files

motion to musicmapping strategy

Page 12: Long term data and knowledge preservation for the Earth

Modules and Dependencies: Examples (Semantic Web data)

ns4

ns2

ns1

ns3

RDF/S

modules and dependencies

Page 13: Long term data and knowledge preservation for the Earth

Scenario: Intelligibility-aware Packaging

FITS 

FITS STANDARD

PDF STANDARD

FITS DICTIONARY

DICTIONARYSPECIFICATION

UNICODESPECIFICATION

XMLSPECIFICATION

o2o1

P1

P2

C3D DirectX MAX/MSP

o3

P3

ZIP

• Gap(o2,P1) = ∅• Gap(o2,P2) =

– {FITS, FITS_STANDARD, FITS_DICTIONARY, DICTIONARY_SPECIFICATION}

• Gap(o2,P3) =– {FITS, FITS_STANDARD, FITS_DICTIONARY,

DICTIONARY_SPECIFICATION, PDF_STANDARD, XML_SPECIFICATION, UNICODE_SPECIFICATION}

• Gap(o3,P3) = – {ZIP}

• Gap(o3, ∅) = – {ZIP, C3D, DirectX, MAX/MSP}

Page 14: Long term data and knowledge preservation for the Earth

Creating an OAIS Archival Information Package

Page 15: Long term data and knowledge preservation for the Earth

PV 2009, ESAC, Spain, 1-3 Dec

Threat Requirement for solutionUsers may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved

Ability to create and maintain adequate Representation Information

Non-maintainability of essential hardware, software or support environment may make the information inaccessible

Ability to share information about the availability of hardware and software and their replacements/substitutes

The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

Ability to bring together evidence from diverse sources about the Authenticity of a digital object

Access and use restrictions may make it difficult to reuse data, or alternatively may not be respected in future

Ability to deal with Digital Rights correctly in a changing and evolving environment

Loss of ability to identify the location of data

An ID resolver which is really persistent

The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future

Brokering of organisations to hold data and the ability to package together the information needed to transfer information between organisations ready for long term preservation

The ones we trust to look after the digital holdings may let us down

Certification process so that one can have confidence about whom to trust to preserve data holdings over the long term

RepInfo toolkit, Packager and Registry – to create and store Representation Information.In addition the Orchestration Manager and Knowledge Gap Manager help to ensure that the RepInfo is adequate.

Registry and Orchestration Manager to exchange information about the obsolescence of hardware and software, amongst other changes.The Representation Information will include such things as software source code and emulators.

Authenticity toolkit will allow one to capture evidence from many sources which may be used to judge Authenticity.

Digital Rights and Access Rights tools allow one to virtualise and preserve the DRM and Access Rights information which exist at the time the Content Information is submitted for preservation.

Persistent Identifier system: such a system will allow objects to be located over time.

Orchestration Manager will, amongst other things, allow the exchange of information about datasets which need to be passed from one curator to another.

The Audit and Certification standard to which CASPAR has contributed will allow a certification process to be set up.

Page 16: Long term data and knowledge preservation for the Earth

Accelerated Lifetime tests

As part of the validation the CASPAR tested simulated the following:

• hardware changes • software changes • changes in the environment (including legal

framework)• changes to the knowledge bases of the

Designated Communities

Page 17: Long term data and knowledge preservation for the Earth

Test scenarios vs Threats to digital preservation

Threat STFC ESA UNESCO IRCAM UnivLeeds CIANT INA

Users may be unable to understand or use the data e.g. the semantics, format, processes or algorithms involved

Non-maintainability of essential hardware, software or support environment may make the information inaccessible

The chain of evidence may be lost and there may be lack of certainty of provenance or authenticity

Access and use restrictions may make it diff icult to reuse data, or alternatively may not be respected in future

The current custodian of the data, whether an organisation or project, may cease to exist at some point in the future

Page 18: Long term data and knowledge preservation for the Earth

STFC Testbed – various STP data

Page 19: Long term data and knowledge preservation for the Earth

ESA testbed

Page 20: Long term data and knowledge preservation for the Earth

UNESCO testbed

The Villa Livia dataset is a collection of files used within the "virtual museum of the ancient Via Flaminia" project: a 3D reconstruction of several archaeological

sites along the ancient Via Flaminia, the largest of them being Villa Livia

Page 21: Long term data and knowledge preservation for the Earth

This is an elevation grid (height map) of the area where Villa Liva is located. It is an ASCII file in the ESRI GRID file format

Page 22: Long term data and knowledge preservation for the Earth

Contemporary Art Testbed

Page 23: Long term data and knowledge preservation for the Earth

Performance Viewer: side-by-side comparison and validation of the transformation. From left to right: 3D visualization in Ogre3D, 3D model of the stage including the virtual dancer in VRML.

Page 24: Long term data and knowledge preservation for the Earth

Figure 8 Some aspects of acousmatic production

Page 25: Long term data and knowledge preservation for the Earth

CASPAR Validation

In all cases members of the Designated Community, with appropriate changes to mimic changes over time, verified that the metadata was adequate for the use despite simulated changes of hardware, software, environment and Designated Community over time.

Full details are available in the validation report (CASPAR Validation report, 2009)

Page 26: Long term data and knowledge preservation for the Earth

Links

CASPAR – http://www.casparpreserves.euCASPAR Source code -

http://sourceforge.net/projects/digitalpreserve/OAIS Reference Model -

http://public.ccsds.org/publications/archive/650x0b1.pdfand the updated draft is available from http://public.ccsds.org/sites/cwe/rids/Lists/CCSDS%206500P11/Overview.aspx

CASPAR Validation report http://www.casparpreserves.eu/Members/cclrc/Deliverables/caspar-validation-evaluation-report/at_download/file

26