dcape project update richard marcianochien-yi hou caryn wojcik university of university of state of...

51
DCAPE Project Update Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT

Post on 15-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Project Update

Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT

Page 2: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

NHPRC Issued a Call…

Design a digital preservation service with a business model for the archival community

Fill the needs of archival repositories that cannot build and sustain their own electronic records archive

Page 3: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Project

Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE10010-08)

Officially started in December 2008 Project extended through April 2012 http://www.dcape.org/

Page 4: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

What is Distributed Custodial Preservation?

Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service

Archival repository retains legal custody Archival repository remains responsible for

archival functions, including preservation and access

Access to collections is controlled by archival repository

Page 5: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Partners 28 people across 9 institutions and 2 staff at UNC,

for a total of 32 participants Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University,

Carleton University (Canada) State Archives: California, Kansas, Michigan,

Kentucky, North Carolina, New York State Library: North Carolina University Archives: Tufts UNC: School of Information and Library Science

(SILS), Sustainable Archives and Leveraging Technologies (SALT)

Page 6: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist

Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE

A series of rules might “look” like this: When files are ingested, replicate them in

three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.

Page 7: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Goals

The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services.

The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories.

Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.

Page 8: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Project Tasks Execute service agreements between UNC

and partners to govern use of the test collections.

Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections.

Ingest test collections into iRODS and validate the rules and services.

Develop business model (including costs) for sustaining a repository service based on iRODS.

Develop model service agreements that define the standard and optional services of the repository.

Page 9: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 10: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 11: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 12: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 13: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Role of iRODS

Preservation environment provides rule-based automation of archival functions (repeatable services)

Standard and optional services will be available

Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities

Page 14: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Life Cycle of Data

Virtual Loading Dock

PreservationArea

SIP AIP DIP ReferenceRoom

DIP

Page 15: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Framework

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP ReferenceRoom

R1 R2

DIP

Page 16: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Capabilities

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Page 17: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

DCAPE Capabilities

Replication

Page 18: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Sample RulesampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</

EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop

Page 19: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26

15

ReferenceRoom

R1 R2

DIP

An Interface that is easy to manage the policies! 24

Page 20: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Hide the technical details Show the information that archivists want to know

Be able to customize policies easily Web-based, no installation required

Interface - Requirements

Page 21: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5, 6,

7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum

Replication

Demo I

Page 22: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 23: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 24: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 25: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 26: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 27: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 28: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 29: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 30: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 31: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 32: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

iRODS

Virtual Loading Dock

V1 V2 V3

PreservationArea

P1 P2 P3

SIP AIP DIP

1 102, 3, 4, 5,

6, 7, 8

11, 12, 13, 14, 16, 17, 18, 19, 20, 21,

22, 23

25, 26 24

15

ReferenceRoom

R1 R2

DIP

Checksum & Virus Check

No Replication

Demo II

Page 33: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 34: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 35: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 36: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 37: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 38: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 39: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 40: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 41: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 42: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE is More

More than a storage service or environment

More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records

Page 43: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Interface

Page 44: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Metadata

Follow Dublin Core model Allow customization Encourage standardization Define

Source: creator, system, archivist Level: collection, accretion, item Accessibility: internal vs. public Fields: Required vs. optional

Page 45: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Workflow Define functionality at each stage

Virtual Loading Dock Pre-accessioning Ingestion

Preservation Area Archival storage Data management Administration Preservation planning

Reference Room Access

Common services Management

Page 46: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management
Page 47: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Business Model

Non-profit Fees for services Fees for storage Storage and disaster prevention services

Software maintenance Access and connectivity

Page 48: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

MetaArchive Cooperative Encourage organizations to build their own preservation

infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment

Basic costs: Equipment: 1st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For

content ingestion preparation

Storage: $1.00 / GB / year for content stored in net. Yearly dues:

Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Collaborative Members: varies

Cost scenarios: 2TB of content

Sustaining Member:

Preservation Member:

Collaborative Member:

$27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server)

$19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

$22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)

Page 49: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Archive-It Subscription service from the Internet Archive, allowing

institutions to build and preserve collections of born digital content

Allows users to crawl, scope, catalog, manage, and browse their archived collections

Collections are hosted at the IA data center and are available through URL and full-text search

a minimum of 2 copies of each collection are kept online

Cost Scenarios

Page 50: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

Storage Cost Model Scenarios

1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage,assuming the need for two copies – one on disk and one on tape (iRODS)?

2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage,assuming the need for two tape copies (using iRODS)?

3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)?

Answer: $2,900 + $1,400 x 1.5 = $5,000

Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835

Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152

Page 51: DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management

DCAPE Project

http://dcape.org