automatic evaluation of migration quality in distributed networks of converters miguel ferreira...

21
Automatic Evaluation of Migration Quality in Distributed Networks of Converters Miguel Ferreira [email protected] Supervisors Ana Alice Baptista José Carlos Ramalho E C D L 0 5 D o c t o r a l C o n s o r t i u m 2005-09-21

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Automatic Evaluation of Migration Quality in

Distributed Networks of Converters

Miguel [email protected]

SupervisorsAna Alice Baptista

José Carlos Ramalho

EC

DL

05D

octo

ral C

onso

rtiu

m

2005-09-21

Contents

• Introductory concepts• Research problems• Proposed system• Methodology• Topics for discussion

Introductory concepts

• Digital preservation– The set of processes and activities that

ensure the continued access to information and all kinds of cultural heritage existing in digital formats

• Digital object– An information object, of any type of

information or any format, that is expressed in digital form

– Text documents, digital photos, vector graphics, databases, Web pages, software

Strategies for digital preservation

• Emulation– Reproduction of the behaviour of a

hardware/software platform in a different technological environment

• Encapsulation– Storing information about how the objects

should be interpreted

• Migration– Periodic transfer of digital materials from one

hardware/software configuration to another

• Others– Computer museums, viewers, Universal Virtual

Computer

Migration

• Advantages– Updated formats that users can read and

edit

• Disadvantages– Requires a continuous diligence– Data loss

• Variants– Migration on request– Normalisation– Distributed migration

Distributed migration

• A network of remote conversion services supported by a semantic layer [Hunter et al.]

• Advantages– Platform independent– Redundancy– Multiple migration paths– Cost reduction– Compatible with other migration strategies

• Disadvantages– bandwidth– Slow

• Examples– PANIC– MyMorph (NLMed)– TOM

FormatB

FormatC

FormatD

FormatE

FormatA

ConversionA-C

ConversionB-C

ConversionC-E

ConversionA-E

How to choose a preservation strategy?

• Many preservation alternatives• Lack of universal acceptance• Distinct preservation

requirements– Satisfaction of the designated community– Characteristics of the collection– Budget

• Framework for evaluating preservation strategies [Rauber]– Utility Analysis

Evaluation of preservation strategies

1. Definition of objective tree2. Assignment of measurement units

(e.g. millimetre, Mb, Euro)

3. Identification of preservation alternatives4. Execution of preservation alternatives

and evaluation of the outcome5. Weighting of criteria in the objective tree6. Calculation of partial and total values7. Ranking of alternatives

Objective tree (example)

Research problems

• Automation of preservation processes

• Authenticity issues• Cost management• Evaluation of preservation

alternatives

Research questions

• Is it feasible to design and implement a system that is able to automatically:– determine the amount of data loss

occurred in a migration and generate detailed migration reports for inclusion in the objects’ preservation metadata?

– provide recommendations of migration paths or target formats that will best suit users’ requirements?

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

MigrationEvaluator

MigrationAdvisor

MigrationKnowledge

Base(MKB)

MetaConverter

Request Migration[Source object]

Store[Migration report]

[Migration data]

Invoke Migration[Source object]

Evaluate migration[Original object] [Migrated object] [Process metrics]

Request Advice[Criteria]

Request advice[Criteria]

[Migrated Object][Migration Report]

[Migration Advice]

[Migration report]

[Migration advice]

[Migrated object]

User

Migration Network

Query MKB

[Parameters]

Proposed System

Methodology - proof of concept

The concepts1. Automatic quantification of data

loss occurred in a migration and generation of preservation metadata

2. Automatic recommendation of migration strategies as well as target formats

The proof (empirical validation)

1. Evaluator versus Human experts2. Advisor versus Evaluation

framework

Key contributions

• For individual preservers, digital archives and libraries: – Outsourcing and automation of digital preservation– Generation of preservation metadata (authenticity)– Ranking of migration alternatives

• For designers and programmers of converters: – Possibility of publishing their converters as services

• For metadata creators and users: – Increase adoption– Help to improve future versions – Accelerate the development of XML bindings

Round-up

• Service oriented architecture (SOA)– Automatic quantification of data loss– Provides recommendations on which

migration paths or target formats are best suited for each user

– Simplifies the creation of preservation metadata

– Based on migration

• Methodology– Proof of concept with empirical

validation• Evaluator versus Human experts• Advisor versus Evaluation framework

Topics for discussion

• Relevance of research • Research methodology • System architecture• Format registry vocabulary

– e.g. MIME types, TOM type descriptors, Global Digital Format Registry, PRONOM, etc.

• Preservation metadata schema– e.g. PREMIS data dictionary (event entity)