http:// integrating metadata schema registries with digital preservation systems to support...
TRANSCRIPT
http://www.ukoln.ac.uk/
Integrating metadata schema registries with digital preservation systems to support interoperability
Michael DayUKOLN, University of Bath, [email protected]
2003 Dublin Core Conference, Seattle, Washington, USA28 September - 2 October 2003
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Presentation outline
– Preservation metadata• Purpose• Standards
– Concepts of interoperability• Metadata capture and inheritance• Object exchange
– Metadata schema registries• Definitions• Application to digital preservation systems
– Concluding thoughts
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Preservation metadata (1)
• All digital preservation strategies depend - to some extent - on the creation, capture and maintenance of metadata
– "Preserving the right metadata is key to preserving digital objects" (ERPANET Briefing Paper, 2003)
• Defined as:– The various types data that will allow the re-
creation and interpretation of the structure and content of digital data over time (Ludäsher, Marciano & Moore, 2001)
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Preservation metadata (2)
• Metadata fulfil various roles, e.g.:– "… to find, manage, control, understand or
preserve … information over time" (Cunningham, 2000)
– Descriptive information; technical information about formats and structure; information about provenance and context; administrative information, e.g. for rights management
– Current schemas either very complex or only provide a basic framework (sometimes both!)
– Perception that different strategies and objects will need different metadata
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Preservation metadata - standards
– Developed from many different perspectives:• Digital libraries:
– METS, NISO Z39.87 (to support digitisation initiatives)– OCLC/RLG Framework, Cedars, NEDLIB, NLA, NLNZ– OAIS influence has been greatest in this area
• Records management and archival description:– Pittsburgh BAC, RKMS, NAA, VERS, PRO, EAD, etc.
– Also standards not specifically developed for preservation, but with some overlap:
• Multimedia– MPEG-7, SMPTE, etc
• Rights management:– <indecs>, MPEG-21, etc.
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
The OAIS model
– Reference Model for an Open Archival Information System (OAIS)
– ISO 14721:2003– Established a common framework of terms and
concepts– Influential on the design of some schemas
» e.g., OCLC/RLG Metadata Framework– Identified basic functions:
» Ingest, Data Management, Archival Storage, Administration, Access, Preservation Planning
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
OAIS functional model
Administration
Ingest
ArchivalStorage
Access
DataManagement
Descriptive info.
PRODUCER
CONSUMER
MANAGEMENT
queries
result sets
Descriptive info.
Preservation Planning
orders
OAIS Functional Entities (Figure 4-1)
SIP
SIP
SIP
DIP
DIP
AIP AIP
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
OAIS information objects
• Information Object (basic concept)– Data Object (bit-stream)– Representation Information (permits “the full
interpretation of Data Object into meaningful information”)
• Information Object Classes– Content Information– Preservation Description Information (PDI)– Packaging Information– Descriptive Information
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
OAIS information packages
• Information package:– Container that encapsulates Content Information
and PDI– Packages for submission (SIP), archival storage
(AIP) and dissemination (DIP)» AIP = “... a concise way of referring to a set of
information that has, in principle, all of the qualities needed for permanent, or indefinite, Long Term Preservation of a designated Information Object”
– PDI = other information (metadata) “which will allow the understanding of the Content Information over an indefinite period of time”
» Reference, Provenance, Context, Fixity
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Draft categorisation (1)
PracticalConceptual
PRO
NEDLIB
DCMI
METS
RKMSPITT
VERS
NLNZ
NLACEDARS
OCLC/RLG
MPEG-7
Z39.87
OAIS
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Draft categorisation (2)
• Earliest schemas were largely conceptual in nature:
– e.g. Pittsburgh BAC model, Cedars outline specification, OCLC/RLG WG I
• Gradually moving towards a more practical focus:
– e.g., VERS, NLNZ, METS, PREMIS WG– Convergence on XML (DTDs and Schemas)
• But there is an urgent need for all this practical experience to be shared
– e.g., published schemas, advice on implementation, etc.
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Implementation
– We need to prove the practical value of metadata frameworks and 'outline specifications'
– It can be difficult for implementers to use these as a guide to the design of real systems?
– We need to move from the conceptual to the practical, need to move beyond proof-of-concept
– Positive signs:• METS/NISO Z39.87• OCLC/RLG PREMIS WG looking at implementation
strategies for preservation metadata
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Sustainability (1)
• Balance risks with costs:– There is a perception that metadata creation and
maintenance will be expensive– But costs associated with data recovery are not
trivial– Need to balance the risks of data loss with the
cost of creating metadata» Cost/benefit analysis» Robust selection criteria» Co-operation between repositories» Re-use of existing metadata
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Sustainability (2)
• Avoid imposing unnecessary costs:– Avoid large schemas (?)– Need to identify the right metadata - 'core
metadata' (?)
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Interoperability (1)
• Heterogeneity– The need to cope with a wide (and growing)
range of preservation metadata standards, object types, formats, etc.
– No realistic prospect of a single standard– Repositories will need to manage a range of
metadata standards, at least within ingest and access functions
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Interoperability (2)
• Metadata creation and capture– Created by humans or captured automatically?– Some metadata already exists, e.g.:
» Embedded within objects» In separate databases» Generated by particular processes
– Need for this metadata to be captured at creation, ingest, migration, and at other appropriate points in object life-cycle
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Interoperability (3)
• Benefits of interoperability:– To support the capture or inheritance of
metadata, e.g. on ingest– To support the management of multiple formats
and metadata schema within a digital preservation system
» Current metadata specifications not entirely clear on how this should be done
– To support the exchange of information packages outside the repository, e.g. by converting to standard 'exchange formats'
» Networks of 'trusted repositories'
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registries (1)
• Registries of metadata schemas may offer one way to deal with this problem
• Parallel concept of format registries– There is "… a pressing need to establish reliable,
sustained repositories of file format specifications, documentation, and related software" (Lawrence, et al., 2000)
– DSpace 'bitstream format registry'– Typed Object Model (TOM) project– Digital Library Federation, et al. recently
proposed a Global digital format registry
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registries (2)
• Metadata schema registries:– "… formal systems that can disclose authoritative
information about the semantics and structure of the data elements that are included within a particular metadata scheme" (Heery, et al., 2000)
– Existing registries include the XML.org Registry and Repository (OASIS), and metadata registries set up by DCMI and SMPTE
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registry functions (1)
• Provides support for the ingest process– Support conversion and metadata capture tools
• May also provide support for the access function
– The export of Dissemination Information Packages
– The exchange of information packages (AIPs?) with other repositories; conversion to exchange standards
• Can link metadata where there are multiple instances within the system
• May help to manage schema evolution
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registry functions (2)
Administration
Ingest
ArchivalStorage
Access
DataManagement
Descriptive info.
PRODUCER
CONSUMER
MANAGEMENT
queries
result sets
Descriptive info.
Preservation Planning
orders
OAIS Functional Entities (Figure 4-1)
SIP
SIP
SIP
DIP
DIP
AIP AIP
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registry functions (2)
Administration
Ingest
ArchivalStorage
Access
DataManagement
Descriptive info.
PRODUCER
CONSUMER
MANAGEMENT
queries
result sets
Descriptive info.
Preservation Planning
orders
OAIS Functional Entities (Figure 4-1)
SIP
SIP
SIP
DIP
DIP
AIP AIP
RegistrySchemasSchemas
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registry functions (2)
Administration
Ingest
ArchivalStorage
Access
DataManagement
Descriptive info.
PRODUCER
CONSUMER
MANAGEMENT
queries
result sets
Descriptive info.
Preservation Planning
orders
OAIS Functional Entities (Figure 4-1)
SIP
SIP
SIP
DIP
DIP
AIP AIP
RegistrySchemasSchemas
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Registries - organisational issues
• Registries are part of infrastructure• Distributed vs. centralised approaches:
– Concept of 'shared services'– Experimental distributed registries are based on
Resource Description Framework (RDF)» CORES Registry» Encourage re-use of metadata (the
'application profile' concept)» Are other technologies more suitable?
• Who should be responsible for them?
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Summing up
• Interoperability supports the reuse of metadata and the exchange of information objects
• There is a possible role for metadata registries to help manage these (and other) processes - but the concept needs extensive scoping and evaluation
• Registries are NOT a panacea
http://www.ukoln.ac.uk/
DC-2003, Seattle, USA, 29 September 2003
Acknowledgements
UKOLN is funded by Resource: the Council for Museums, Archives and Libraries, the Joint Information Systems Committee (JISC) of the UK higher and further education funding councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.