dcape project update richard marcianochien-yi hou caryn wojcik university of university of state of...

Post on 15-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DCAPE Project Update

Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT

NHPRC Issued a Call…

Design a digital preservation service with a business model for the archival community

Fill the needs of archival repositories that cannot build and sustain their own electronic records archive

DCAPE Project

Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE10010-08)

Officially started in December 2008 Project extended through April 2012 http://www.dcape.org/

What is Distributed Custodial Preservation?

Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service

Archival repository retains legal custody Archival repository remains responsible for

archival functions, including preservation and access

Access to collections is controlled by archival repository

DCAPE Partners 28 people across 9 institutions and 2 staff at UNC,

for a total of 32 participants Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University,

Carleton University (Canada) State Archives: California, Kansas, Michigan,

Kentucky, North Carolina, New York State Library: North Carolina University Archives: Tufts UNC: School of Information and Library Science

(SILS), Sustainable Archives and Leveraging Technologies (SALT)

DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist

Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE

A series of rules might “look” like this: When files are ingested, replicate them in

three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.

DCAPE Goals

The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services.

The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories.

Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.

Project Tasks Execute service agreements between UNC

and partners to govern use of the test collections.

Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections.

Ingest test collections into iRODS and validate the rules and services.

Develop business model (including costs) for sustaining a repository service based on iRODS.

Develop model service agreements that define the standard and optional services of the repository.

Role of iRODS

Preservation environment provides rule-based automation of archival functions (repeatable services)

Standard and optional services will be available

Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities

Life Cycle of Data

Virtual Loading Dock

PreservationArea

SIP AIP DIP ReferenceRoom

DIP

DCAPE Framework

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP ReferenceRoom

R1 R2

DIP

DCAPE Capabilities

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

DCAPE Capabilities

Replication

Sample RulesampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</

EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26

15

ReferenceRoom

R1 R2

DIP

An Interface that is easy to manage the policies! 24

Hide the technical details Show the information that archivists want to know

Be able to customize policies easily Web-based, no installation required

Interface - Requirements

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6,

7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum

Replication

Demo I

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5,

6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum & Virus Check

No Replication

Demo II

DCAPE is More

More than a storage service or environment

More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records

DCAPE Interface

DCAPE Metadata

Follow Dublin Core model Allow customization Encourage standardization Define

Source: creator, system, archivist Level: collection, accretion, item Accessibility: internal vs. public Fields: Required vs. optional

DCAPE Workflow Define functionality at each stage

Virtual Loading Dock Pre-accessioning Ingestion

Preservation Area Archival storage Data management Administration Preservation planning

Reference Room Access

Common services Management

DCAPE Business Model

Non-profit Fees for services Fees for storage Storage and disaster prevention services

Software maintenance Access and connectivity

MetaArchive Cooperative Encourage organizations to build their own preservation

infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment

Basic costs: Equipment: 1st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For

content ingestion preparation

Storage: $1.00 / GB / year for content stored in net. Yearly dues:

Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Collaborative Members: varies

Cost scenarios: 2TB of content

Sustaining Member:

Preservation Member:

Collaborative Member:

$27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server)

$19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

$22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

Archive-It Subscription service from the Internet Archive, allowing

institutions to build and preserve collections of born digital content

Allows users to crawl, scope, catalog, manage, and browse their archived collections

Collections are hosted at the IA data center and are available through URL and full-text search

a minimum of 2 copies of each collection are kept online

Cost Scenarios

Storage Cost Model Scenarios

1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage,assuming the need for two copies – one on disk and one on tape (iRODS)?

2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage,assuming the need for two tape copies (using iRODS)?

3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)?

Answer: $2,900 + $1,400 x 1.5 = $5,000

Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835

Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152

DCAPE Project

http://dcape.org

top related