dcape project update richard marcianochien-yi hou caryn wojcik university of university of state of...
Post on 15-Jan-2016
212 views
TRANSCRIPT
DCAPE Project Update
Richard Marciano Chien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management Services SALT SALT
NHPRC Issued a Call…
Design a digital preservation service with a business model for the archival community
Fill the needs of archival repositories that cannot build and sustain their own electronic records archive
DCAPE Project
Distributed Custodial Archival Preservation Environments Project was funded by NHPRC in 2008 (RE10010-08)
Officially started in December 2008 Project extended through April 2012 http://www.dcape.org/
What is Distributed Custodial Preservation?
Physical custody of archival collections is distributed outside of the archival repository to a trusted preservation service
Archival repository retains legal custody Archival repository remains responsible for
archival functions, including preservation and access
Access to collections is controlled by archival repository
DCAPE Partners 28 people across 9 institutions and 2 staff at UNC,
for a total of 32 participants Cultural Entity: Getty Research Institute Cyberinfrastructure: West Virginia University,
Carleton University (Canada) State Archives: California, Kansas, Michigan,
Kentucky, North Carolina, New York State Library: North Carolina University Archives: Tufts UNC: School of Information and Library Science
(SILS), Sustainable Archives and Leveraging Technologies (SALT)
DCAPE Goals Build a preservation environment that meets the needs of archival repositories for trusted archival preservation services. Services are based on policies (rules) that are defined by the archivist
Over 250 rules have been developed for the iRODS library that can be leveraged for DCAPE
A series of rules might “look” like this: When files are ingested, replicate them in
three different locations and run a checksum on each file. Bit-check files every month. Send an alert about any changes to the files.
DCAPE Goals
The trusted digital repository infrastructure will be assembled from state-of-the-art rule-based data management systems, commodity storage systems, and sustainable preservation services.
The software infrastructure will automate many of the administrative tasks associated with the management of archival repositories.
Tasks will include: authentication, replication, migration, obsolete file management, preservation metadata management, etc.
Project Tasks Execute service agreements between UNC
and partners to govern use of the test collections.
Define rules and services (organized according to the OAIS framework) for iRODS to perform on test collections.
Ingest test collections into iRODS and validate the rules and services.
Develop business model (including costs) for sustaining a repository service based on iRODS.
Develop model service agreements that define the standard and optional services of the repository.
Role of iRODS
Preservation environment provides rule-based automation of archival functions (repeatable services)
Standard and optional services will be available
Shared service should reduce costs for each archival repository compared to the cost of building in-house preservation capabilities
Life Cycle of Data
Virtual Loading Dock
PreservationArea
SIP AIP DIP ReferenceRoom
DIP
DCAPE Framework
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP ReferenceRoom
R1 R2
DIP
DCAPE Capabilities
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP
1 102, 3, 4, 5, 6, 7, 8
11, 12, 13, 14, 16, 17, 18, 19, 20, 21,
22, 23
25, 26 24
15
ReferenceRoom
R1 R2
DIP
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP
1 102, 3, 4, 5, 6, 7, 8
11, 12, 13, 14, 16, 17, 18, 19, 20, 21,
22, 23
25, 26 24
15
ReferenceRoom
R1 R2
DIP
DCAPE Capabilities
Replication
Sample RulesampleRule||delayExec(<PLUSET>1m</PLUSET><EF>2m</
EF>,assign(*path,/samplePath)##msiMakeGenQuery("COLL_NAME","COLL_PARENT_NAME = '*path' AND META_COLL_ATTR_NAME = 'DCAPE_COLL_TYPE' AND META_COLL_ATTR_VALUE = 'AIP'",*GenQInp)##msiExecGenQuery(*GenQInp, *GenQOut)##forEachExec(*GenQOut,msiGetValByKey(*GenQOut, "COLL_NAME",*DataObj)##msiSplitPath(*DataObj,*p,*c)##assign(*newpath,SamplePath2*c) ##msiDataObjRename(*DataObj,*newpath,1,*result)##acAddLog(Move_Collection,"*DataObj")##acCheckPolicy(*newpath,DCAPE_POLICY_REPLICA,*pResult)##ifExec((*pResult == Yes),msiCollRepl(*newpath,destRescName=resource,*status)##acAddLog(Replicate_Coll,"*newpath"),nop,nop,nop),nop),nop)|nop
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP
1 102, 3, 4, 5, 6, 7, 8
11, 12, 13, 14, 16, 17, 18, 19, 20, 21,
22, 23
25, 26
15
ReferenceRoom
R1 R2
DIP
An Interface that is easy to manage the policies! 24
Hide the technical details Show the information that archivists want to know
Be able to customize policies easily Web-based, no installation required
Interface - Requirements
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP
1 102, 3, 4, 5, 6,
7, 8
11, 12, 13, 14, 16, 17, 18, 19, 20, 21,
22, 23
25, 26 24
15
ReferenceRoom
R1 R2
DIP
Checksum
Replication
Demo I
iRODS
Virtual Loading Dock
V1 V2 V3
PreservationArea
P1 P2 P3
SIP AIP DIP
1 102, 3, 4, 5,
6, 7, 8
11, 12, 13, 14, 16, 17, 18, 19, 20, 21,
22, 23
25, 26 24
15
ReferenceRoom
R1 R2
DIP
Checksum & Virus Check
No Replication
Demo II
DCAPE is More
More than a storage service or environment
More than a reference tool DCAPE will provide the capability for all archival repositories to fulfill their responsibility to preserve electronic records
DCAPE Interface
DCAPE Metadata
Follow Dublin Core model Allow customization Encourage standardization Define
Source: creator, system, archivist Level: collection, accretion, item Accessibility: internal vs. public Fields: Required vs. optional
DCAPE Workflow Define functionality at each stage
Virtual Loading Dock Pre-accessioning Ingestion
Preservation Area Archival storage Data management Administration Preservation planning
Reference Room Access
Common services Management
DCAPE Business Model
Non-profit Fees for services Fees for storage Storage and disaster prevention services
Software maintenance Access and connectivity
MetaArchive Cooperative Encourage organizations to build their own preservation
infrastructures rather than outsourcing to external vendors 3 levels of membership: 3 yr commitment
Basic costs: Equipment: 1st year, $4.6K server purchase Staffing: 2% of a sys. admin’s time + POC admin + software eng. For
content ingestion preparation
Storage: $1.00 / GB / year for content stored in net. Yearly dues:
Sustaining Members: $5.5K / yr Preservation Members: $3K / yr Collaborative Members: varies
Cost scenarios: 2TB of content
Sustaining Member:
Preservation Member:
Collaborative Member:
$27.1K / 3 yrs ---> ($5.5K (membership) + $2K (space) )x 3 yrs + $4.6K (server)
$19.6K / 3 yrs ---> ($3K (membership) + $2K (space)) x 3 yrs + $4.6K (server)
$22.6K/ 3 yrs ---> ($4K (membership) + $2K (space)) x 3 yrs + $4.6K (server)
Archive-It Subscription service from the Internet Archive, allowing
institutions to build and preserve collections of born digital content
Allows users to crawl, scope, catalog, manage, and browse their archived collections
Collections are hosted at the IA data center and are available through URL and full-text search
a minimum of 2 copies of each collection are kept online
Cost Scenarios
Storage Cost Model Scenarios
1. Question: What is the yearly charge for a customer with 4,000 files and 1.5 TB of storage,assuming the need for two copies – one on disk and one on tape (iRODS)?
2. Question: What is the yearly cost of 6 million files (web crawl scenario) and 1 TB of storage,assuming the need for two tape copies (using iRODS)?
3. Question: What is the yearly cost of 100,000 files and 20 TB of storage with two tape copies (using iRODS)?
Answer: $2,900 + $1,400 x 1.5 = $5,000
Answer: $2,900 + $550 + 6 x $870 + $5,165 = $13,835
Answer: $2,900 + 20 x $550 + 0.1 x $870 + $5,165 = $19,152
DCAPE Project
http://dcape.org