ya-ning arthur chen, feng-chien chung computing centre, academia sinica 11 april, isgc 2008
DESCRIPTION
Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008. A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration. Outline. Background of MAAT From Website to Institutional Repository - PowerPoint PPT PresentationTRANSCRIPT
A hybrid approach of digital long term preservation to institutional
repositories - A case study of DSpace/SRB Integration
Ya-ning Arthur Chen, Feng-chien Chung
Computing Centre, Academia Sinica
11 April, ISGC 2008
Outline
• Background of MAAT• From Website to Institutional Repository• Long Term Preservation & OAIS• The Hybrid Approach• Future
MAAT – Background
• The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan
• To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)
MAAT – Motivation
• A number of documents have been created and can be categorized into
– questionnaires,
– work sheets,
– meeting records,
– metadata mapping tables,
– system specifications,
– best practices of metadata standards,
– technical reports,
– research papers,
– briefings, and
– tutorial materials.
• Most documents of the MAAT website are arranged in a static manner.
MAAT Website
http://www.sinica.edu.tw/~metadata
Academia Sinica
MAAT - Consideration1
• Document management and repository– over 1,000 documents and URL links have
been arranged and served at the MAAT website.
– the MAAT website needs an effective system of document management.
• Access control – The MAAT website still lacks access
control for document access.
MAAT - Consideration2
• Workflow reengineering– the MAAT website adopts a centralized model
to maintain documents and website arrangement.
– This model is very complicated and labor-intensive, and the overhead cost is very high.
• Usage Statistics Report
MAAT - Challenge
• Too many publications, • Too much change (that is various
document versions), • Too many contributors, and • Too many institutions.
Implementation Level
Static Website
Institution Repository
Phase1: from website to IR
DSpace - feature
• Captures– Digital research material in any format– Directly from creators (e.g. faculty)– Large-scale, stable, managed long-term storage
• Describes– Descriptive metadata (Dublin Core)– Technical metadata (file size, format…) – Rights metadata (licenses, creative commons…)
• Distributes– Via WWW, with necessary access control
• Preserves– Persistent ID and Handle– Bitstream format registry
DSpace - Data Model
MAAT – Content1
• Content Type – 支援計畫 (Documents from the Projects we support)
– 出版與活動 (Documents of Publication and Activity)
– 計畫管理 (Project Management related – restricted documents)
– 研究發展 (Research & Development - restricted documents)
– 48 Communities, 110 collections, 783 items
• Document Format – User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint
slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others
– System generate: Over 1900 Plain Text files (mainly DSpace License files)…
MAAT – Content2
• Access Method– DSpace user browse and search interface– Search engines (google, yahoo…etc.)– OAI-PMH harvesting
MAAT DSpace
http://pl11.sinica.edu.tw:8080/dspace/index.jsp
DSpace - Consideration
• The Need for Extending DSpace Storage Capabilities– The amount of documents grows so fast that an
enormous size storage solution is required
• The Lack of Risk Management Mechanism– The Reliable Backup and Disaster Recovery Systems
are not included in the default DSpace Installation
Implementation Level
Statis Website
Institution Repository
Phase1: from website to IR
Institution Repository + Grid
Phase2: from IR to Long Term Preservation
DSpace/SRB Approach1
• In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to – allow DSpace to use the data grid as a storage layer– permit the exchange of authentic documents between them
• NARA Proposal & Participants– San Diego Super Computer Center (SDSC)
• Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program
– MIT Libraries– UC San Diego Libraries (UCSD)– Hewlett Packard Laboratories (HP)– National Archives and Records Administration (NARA)
DSpace/SRB Approach2
• In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage.
• Both traditional and SRB storage are specified by configuration parameters.
• Both traditional and SRB bitstream stores are configured in dspace.cfg
Examination of DSpace/SRB
• An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community
OAIS Functional Model
Workflow
4. DSpace DB/SRB MCAT
3. SRB Storage
6. DSpace User Interface
2. DSpace Ingest
1. Common service (Network, OS…)
Submit Interface& Batch Import
DSpace/GoogleUser
5. DSpace & SRB Admin
OAIS Functional Model…Again
DSpace & SRB Administration
DSpace RDBMS & SRB MCAT DSpace
Submit Interface
DSpace User InterfaceSRB
Mass Storage
DSpace Ingest
DSpace Batch Import
Producer, Management and Consumer
• Producer– DSpace may play the role of ingest SIP from
producer, and generate AIP for Management & Storage
• Management– SRB May play the role of receive AIP then Store &
Manage data, and generate AIP for Access• Consumer
– DSpace May Play the role of process the access request and generate the proper DIP for dissemination
DSpace RDBMS & SRB MCAT DSpace
Submit Interface
DSpace User InterfaceSRB
Mass Storage
DSpace Ingest
DSpace Batch Import
SIP AIP AIPDIP
Archives arrangement
• Logical Archives structure:– DSpace allow multi-level communities and
one level collection– Archive’s principle
• Principle of provenance• Principle of respect des fonds
• Physical Files Arrangement: – SRB Mass Storage Technology
Future1
• Best Practice & SOP for DSpace/SRB integration
• Deeper Check Against Activities of OAIS• Preservation Planning and policy
– Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan
Future2
• Feasibility Evaluation– Migrate from SRB to others advanced
technology, such as SRM, iRODS…– Adopt metadata approach to enhance digital
preservation, such as PREMIS and METS (ex: structural map, behavior section…)
Thank You