funded by: © ahds preservation in institutional repositories preliminary conclusions of the sherpa...
TRANSCRIPT
Funded by:
© AHDS
Preservation in Institutional Repositories
Preliminary conclusions of the SHERPA DP project
Gareth Knight
Digital Preservation OfficerAHDS
25 October 2006
Funded by:
© AHDS
SHERPA DP Project• Acronym: Securing a Hybrid Environment for Research
Preservation and Access: Digital Preservation
• Development Partners: AHDS at King’s College London (Lead), Nottingham, Glasgow, Edinburgh, White Rose Consortium, London Leap Consortium
• Duration: 2 years, March 2005 – February 2007
• Funding: JISC and CURL
• Programme: JISC Digital Preservation and Records Management Programme
Funded by:
© AHDS
Sherpa DP ProjectPurpose:
To create a collaborative, shared preservation environment for the SHERPA project framed around the OAIS Reference Model.
Aims:1. To develop a prototype preservation environment for SHERPA
Partners based on the OAIS reference model including a set of protocols and software tools.
2. To establish a workflow & procedures to suit the needs of institutional repositories and the preservation service.
3. Provide guidance on the ingest process, to encourage the deposit of formats that will minimise long-term operational costs.
4. To develop an exemplar for an outsourced preservation service.
5. Create a User Guide that recommends standards, best practice, protocols and processes that may be used in the management, preservation and presentation of e-print repositories
Funded by:
© AHDS
Why distribute preservation functions?
• In many IRs, there is a scarcity of staff with necessary preservation skills and expertise
• institutional repositories lack the time to implement preservation
• potential cost savings in terms of staff time and equipment?
• seeking to remove repetition of services
• Preservation is not inherent in most repository software. DSpace and EPrints software primarily about submission, basic storage and access (for the moment)
Funded by:
© AHDS
Repository Landscape
ERPAePRINTSNottingham
ePrints
EdinburghResearc hArc hive
W hite Rosec onsortium
SOAS Eprintrepository
NottinghameTheses
Modern LanguagesPublic ations Arc hive
UCLePrints
ImperialEprints
Roy alHolloway
LSE Researc hOnline
KingsePrints
J eLit J ournalof eliterac y
GlasgowePrintsServic ee P rin ts rep os ito ry
so ftw a reD S p ace rep os ito ry
so ftw a re
P u b l ishe d &p ee r-rev iew ed
p ap e rs
e lec tro n ic th ese s& d isse rta tio n s
S in g le in s ti tu tionre po s ito ries
m u lti in s ti tu tionre po s ito ries
Birkbec kePrints
Funded by:
© AHDS
OAIS Functional Model
4-1
.2
MANAGEMENT
Ingest
Data Management
SIP
AIPDIP
queries
result setsAccess
PRODUCER
CONSUMER
Descriptive Info
AIP
orders
Descriptive Info
Archival Storage
Administration
Preservation Planning
Funded by:
© AHDS
Distributed OAIS ModelSIP = E-print & discovery MD
AIP = E-print, discovery& preservation MD
DIP = E-print and discovery MD
A disaggregated OAIS
Pre se rv atio n Se rv ice (Se rv ice Pro v id e r)
In stitu tio n al Re p o sito ry (Co n te n t Pro v id e r)
R e se a rch e r
D a ta Man a ge me n t
D a ta Man a ge me n t
Acce ss
Acce ss
Arch iva l Sto ra ge
Arch iva l Sto ra ge
Ad min is tra tio n
SIP
DIP
DIP
AIP
In ge st
In ge st
SIP
Pre se rva tio n P la n n in g
D e po sito r
AIP
Ad min is tra tio n
Funded by:
© AHDS
Generic WorkflowSub mit d a ta& me tada ta
Validationsuc c esful
RequestResubmission
N o
Ismetadata
c omplete?
Enhanc eMetadata
Copy SIP torepository
store
E-print inappropriate
depositformat
Migrate todissemination
formatN o
Copy D IP torepository store &
disseminationserver
Makeavailable inc atalogue
Researc her(Consumer)
ac c esses data
Metadatatransfer
C reatePres ervation
m etadata
GenerateAIP
Riskassessment
revealproblems?
Im plem entm igrationS trategy
Yes
N o
Sc heduleObsolesc enc e
Monitoring
Depositformat
Obsolete
C reate newdis s em ination
form at
Transfer A IP toPreservation
store
Service Provider (Preservation Service)
Ye s
Ye s
Content Provider (Institutional Repository)
Rec ord detailsof migration
ac tion
98
2
3
4
56
Ye s
7
1 0
1 1
N o
1 2
N o
Ye s
Validationsuc c esful
RequestResubmission
N o
1 3
1 4
1 5
Datatransfer
1 6
Funded by:
© AHDS
Practical WorkflowThe AIP must be prepared prior to ingest into Fedora:• Accept SIP (Harvest metadata, process harvested metadata,
extract digital objects• Generate AIP (normalise datastreams and create preservation
metadata for SIP & AIP)• Data Management (integrity check, format obsolescence, format
migration, AIP additions)
Change Content Provider practices to support appropriate services:
• Ingest policy - encourage preservation formats • Dissemination policy – encourage distribution of original
deposited formats• Licence agreement
Funded by:
© AHDS
Minimum Requirements for Preservation
Technical1. Expose basic metadata to identify new submissions.2. Provide some method of identifying data objects associated with
a metadata record3. Provide some method of authenticating data objects associated
with a metadata record
Policy4. Policies to identify preferred file formats for deposit and inform
the Producer (depositor) and preservation service provider of these requirements;
5. Create and implement a deposit licence that:• Establish permission for the Content Provider to allocate
responsibility for preservation to a third-party.• Establishes permission to transform the submitted resource
(e-print) for the purpose of preservation and accessibility.
Funded by:
© AHDS
Best Practice Requirementsfor preservation
Technical6. Expose a full record of all metadata stored by the IR, including
desc, admin, preservation.7. Provide a detailed description of the metadata schema
implemented, including a list of elements and vocabulary.8. Co-operate with the partner institution to identify methods that
may be used to return metadata and data to the institutional repository
Policy9. Co-operate with the Preservation Service Provider to review
and potentially revise ingest policies to ensure SIPs are deposited in formats appropriate for preservation.
Funded by:
© AHDS
Service Provider ResponsibilitiesStorage:• Provide a permanent storage facility and disaster recovery
capabilities• Manage storage hierarchyPreservation Planning:• Evaluate contents of archive and undertake risk assessment • Develop recommendations for preservation standards and
policies• Life cycle management. Monitor changes in technology
environment, users’ service requests, and knowledge basePreservation Action:• Develop and implement migration plans• Create and manage multiple copies of content, including off-site
storage• Record appropriate information on any changes
Funded by:
© AHDS
System Architecture• Fedora Server (initially version 2.1.1).
• FedoraGSearch generic search plug-in (currently a Beta version. It will be bundled with Fedora in the future)
• MySQL database server (initially version 5.0).
• Elated web interface to Fedora (used for SHERPA DP web interface)
• JHOVE (initially version 1.1)
• DROID
• Format registry (e.g. GDFR)
Funded by:
© AHDS
Preliminary Conclusions• There is no out-of-the-box solution to preservation.
• The location of preservation activities is unimportant. However, appropriate repository services must exist
• Repository interoperability is possible using appropriate standards exist.
• Preservation begins on ingest!
• Further investigation on OAIS-compliant models to represent distributed services is necessary
Funded by:
© AHDS
Further InformationURL:
http://www.sherpadp.org.uk/
Contact