ya-ning arthur chen, feng-chien chung computing centre, academia sinica 11 april, isgc 2008

27
A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Upload: kyria

Post on 14-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Ya-ning Arthur Chen, Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008. A hybrid approach of digital long term preservation to institutional repositories - A case study of DSpace/SRB Integration. Outline. Background of MAAT From Website to Institutional Repository - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

A hybrid approach of digital long term preservation to institutional

repositories - A case study of DSpace/SRB Integration

Ya-ning Arthur Chen, Feng-chien Chung

Computing Centre, Academia Sinica

11 April, ISGC 2008

Page 2: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Outline

• Background of MAAT• From Website to Institutional Repository• Long Term Preservation & OAIS• The Hybrid Approach• Future

Page 3: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT – Background

• The Metadata Architecture & Application Team (MAAT) was established in 2002 to engage in metadata research and service supportive for the National Digital Archives Program (NDAP) in Taiwan

• To date, the MAAT has been supporting over 80 digital library projects of Taiwan E-Learning & Digital Archive Program (TELDAP, former: NDAP)

Page 4: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT – Motivation

• A number of documents have been created and can be categorized into

– questionnaires,

– work sheets,

– meeting records,

– metadata mapping tables,

– system specifications,

– best practices of metadata standards,

– technical reports,

– research papers,

– briefings, and

– tutorial materials.

• Most documents of the MAAT website are arranged in a static manner.

Page 5: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT Website

http://www.sinica.edu.tw/~metadata

Academia Sinica

Page 6: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT - Consideration1

• Document management and repository– over 1,000 documents and URL links have

been arranged and served at the MAAT website.

– the MAAT website needs an effective system of document management.

• Access control – The MAAT website still lacks access

control for document access.

Page 7: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT - Consideration2

• Workflow reengineering– the MAAT website adopts a centralized model

to maintain documents and website arrangement.

– This model is very complicated and labor-intensive, and the overhead cost is very high.

• Usage Statistics Report

Page 8: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT - Challenge

• Too many publications, • Too much change (that is various

document versions), • Too many contributors, and • Too many institutions.

Page 9: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Implementation Level

Static Website

Institution Repository

Phase1: from website to IR

Page 10: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

DSpace - feature

• Captures– Digital research material in any format– Directly from creators (e.g. faculty)– Large-scale, stable, managed long-term storage

• Describes– Descriptive metadata (Dublin Core)– Technical metadata (file size, format…) – Rights metadata (licenses, creative commons…)

• Distributes– Via WWW, with necessary access control

• Preserves– Persistent ID and Handle– Bitstream format registry

Page 11: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

DSpace - Data Model

Page 12: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT – Content1

• Content Type – 支援計畫 (Documents from the Projects we support)

– 出版與活動 (Documents of Publication and Activity)

– 計畫管理 (Project Management related – restricted documents)

– 研究發展 (Research & Development - restricted documents)

– 48 Communities, 110 collections, 783 items

• Document Format – User upload: 794 pdf files, 446 ms word files, 59 ms powerpoint

slides, 27 xml files, 17 jpeg images, 16 html files, 7 ms excel files…and the others

– System generate: Over 1900 Plain Text files (mainly DSpace License files)…

Page 13: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT – Content2

• Access Method– DSpace user browse and search interface– Search engines (google, yahoo…etc.)– OAI-PMH harvesting

Page 14: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

MAAT DSpace

http://pl11.sinica.edu.tw:8080/dspace/index.jsp

Page 15: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

DSpace - Consideration

• The Need for Extending DSpace Storage Capabilities– The amount of documents grows so fast that an

enormous size storage solution is required

• The Lack of Risk Management Mechanism– The Reliable Backup and Disaster Recovery Systems

are not included in the default DSpace Installation

Page 16: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Implementation Level

Statis Website

Institution Repository

Phase1: from website to IR

Institution Repository + Grid

Phase2: from IR to Long Term Preservation

Page 17: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

DSpace/SRB Approach1

• In 2004, NARA (with NSF/NPACI) has funded a project aimed at integrating DSpace and SRB to – allow DSpace to use the data grid as a storage layer– permit the exchange of authentic documents between them

• NARA Proposal & Participants– San Diego Super Computer Center (SDSC)

• Member of National Partnership for Advanced Computational Infrastructure (NPACI) an NSF sponsored program

– MIT Libraries– UC San Diego Libraries (UCSD)– Hewlett Packard Laboratories (HP)– National Archives and Records Administration (NARA)

Page 18: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

DSpace/SRB Approach2

• In DSpace, there can be multiple bitstream stores, each of these bitstream stores can be traditional storage or SRB storage.

• Both traditional and SRB storage are specified by configuration parameters.

• Both traditional and SRB bitstream stores are configured in dspace.cfg

Page 19: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Examination of DSpace/SRB

• An Open Archive Information System (OAIS) intends to preserve information for access and use by a Designated Community

Page 20: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

OAIS Functional Model

Page 21: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Workflow

4. DSpace DB/SRB MCAT

3. SRB Storage

6. DSpace User Interface

2. DSpace Ingest

1. Common service (Network, OS…)

Submit Interface& Batch Import

DSpace/GoogleUser

5. DSpace & SRB Admin

Page 22: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

OAIS Functional Model…Again

DSpace & SRB Administration

DSpace RDBMS & SRB MCAT DSpace

Submit Interface

DSpace User InterfaceSRB

Mass Storage

DSpace Ingest

DSpace Batch Import

Page 23: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Producer, Management and Consumer

• Producer– DSpace may play the role of ingest SIP from

producer, and generate AIP for Management & Storage

• Management– SRB May play the role of receive AIP then Store &

Manage data, and generate AIP for Access• Consumer

– DSpace May Play the role of process the access request and generate the proper DIP for dissemination

DSpace RDBMS & SRB MCAT DSpace

Submit Interface

DSpace User InterfaceSRB

Mass Storage

DSpace Ingest

DSpace Batch Import

SIP AIP AIPDIP

Page 24: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Archives arrangement

• Logical Archives structure:– DSpace allow multi-level communities and

one level collection– Archive’s principle

• Principle of provenance• Principle of respect des fonds

• Physical Files Arrangement: – SRB Mass Storage Technology

Page 25: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Future1

• Best Practice & SOP for DSpace/SRB integration

• Deeper Check Against Activities of OAIS• Preservation Planning and policy

– Monitor Producer/Management/Consumer’s service requirements and emerging technology, develop archival strategy & migration plan

Page 26: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Future2

• Feasibility Evaluation– Migrate from SRB to others advanced

technology, such as SRM, iRODS…– Adopt metadata approach to enhance digital

preservation, such as PREMIS and METS (ex: structural map, behavior section…)

Page 27: Ya-ning Arthur Chen,  Feng-chien Chung Computing Centre, Academia Sinica 11 April, ISGC 2008

Thank You