migrating from oclc's digital archive to duracloud
Upload: government-and-heritage-library-digital-information-management-program
Post on 15-Jan-2015
268 views
DESCRIPTION
Presented by Lisa Gregory at Best Practices Exchange, December 2012. This presentation compared OCLC's Digital Archive to DuraCloud, and the process of migrating storage from one to the other.TRANSCRIPT
![Page 1: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/1.jpg)
![Page 2: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/2.jpg)
State Library of North Carolina
• Part of the North Carolina Department of Cultural Resources
• Work closely/pool resources with the State Archives
• Digital Information Management Program
![Page 3: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/3.jpg)
State Publications Genealogy Research
North Caroliniana
CONTENT
~ 4.75 FTE
STAFF
Local server (state-supported)
Offsite storage (vendor)
STORAGE
CONTENTdm
Connexion Digital Import
SYSTEMS
![Page 4: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/4.jpg)
Digitized Born-Digital .75 3.25
CONTENTdm Project Client
CONTENTdm Connexion
.75 Local Storage
Remote Storage
![Page 5: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/5.jpg)
CONTENT • We preserve access and master copies
• 1.27 TB, 162,000+ files
• Mostly .tif, .pdf, .jpg, .txt
![Page 6: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/6.jpg)
CONTENT File structure by “project”
admindocs fulltext images_access images_master images_processed metadata
Naming convention pubs_serial_annualreportclean2005.pdf gen_statefair_lifecharacterthomasruffin1871_0001.tif
![Page 7: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/7.jpg)
STORAGE
Local storage
• managed by department-wide IT
• includes working & preservation content
• server is shared, but our directory is restricted
• daily incremental backups
![Page 8: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/8.jpg)
OCLC’s Digital Archive
• Began using in 2008
• Web interface for access
• FTP or automatic uploads
• Integrated with CONTENTdm
• Detailed reporting, broken out by CONTENTdm collection
• Fixity checks, virus checks
![Page 9: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/9.jpg)
![Page 10: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/10.jpg)
![Page 11: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/11.jpg)
![Page 12: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/12.jpg)
• Integration with CONTENTdm
• Fixity checks and virus scans
• Responsive support
• Extensive reports
• Integration with CONTENTdm
• Finding and retrieving items
• Manifest/batch upload requirement
• Vendor-side error reporting
• Verifying storage contents
+
![Page 13: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/13.jpg)
DuraSpace’s • Began using in 2012
• Web interface for access
• Web interface or client-side tools for upload
• Content Management System-agnostic
• Fixity checks
![Page 14: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/14.jpg)
![Page 15: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/15.jpg)
• Presentation is like a traditional gui file manager
• Can designate spaces, permissions
• Can make a space public
• Powerful upload tools
• Fixity scans
• Robust reporting
• Easy to get content out
• Choice of storage services
• VERY collaborative support
• Non-profit
• Searching • Sorting • Verifying storage
contents • Overwriting isn’t
hard to do • Batch delete • MD5
+
![Page 16: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/16.jpg)
PREPARATION CONTENTdm
Local server OCLC DA
?
![Page 17: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/17.jpg)
CONTENTdm Local server
1. Exported metadata from CONTENTdm 2. Exported file names from local server 3. Bashed preservation file names, checksums 4. Identified and recovered missing files
![Page 18: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/18.jpg)
1. Exported metadata from CONTENTdm
2. Exported file names from local server
3. Bashed preservation file names, checksums
4. Identified and recovered missing files
Onerous to impossible
Easy
Easy but time consuming
Easy-ish
![Page 19: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/19.jpg)
1. Exported metadata from CONTENTdm
Onerous to impossible
• OCLC had to provide export for largest & most critical collection
• 363 MB tar file -> 18 x 100+ MB csv files • Added frustration: metadata for
compound objects v. multi-page pdfs
![Page 20: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/20.jpg)
2. Exported file names from local server
1. Bashed preservation file names, checksums
Easy
Easy but time consuming
• Spreadsheet gymnastics • Manual review for filename/checksum
inconsistencies
![Page 21: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/21.jpg)
4. Identified and recovered missing files
Easy-ish
• Missing from CONTENTdm? Added by librarians
• Missing from local server? Request to OCLC or re-download from CONTENTdm
![Page 22: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/22.jpg)
THE MOVE
1. Tested sync and upload tools 2. Discussed spaces 3. Ran sync tool on local preservation storage 4. Ongoing maintenance: upload tool
Local server
DuraCloud
![Page 23: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/23.jpg)
1. Tested sync and upload tools
• Helped determine flags to manage computer resources during sync
• Verified logging output, permissions • Helped flesh out local workflow
Easy
![Page 24: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/24.jpg)
2. Discussed spaces
• Many spaces or few, to accommodate different workflows?
• Assignment of permissions
Easy, and Interesting
![Page 25: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/25.jpg)
3. Ran sync tool on local preservation storage
• Ran continuously for 5 2/3 days • 94,177 items
Easy
![Page 26: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/26.jpg)
4. Ongoing maintenance: upload tool
• Uploads done weekly and monthly • Upload tool used to avoid accidental
overwriting • Have to create “mock” file structure
Easy-ish
![Page 27: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/27.jpg)
Working directory
Working directory
Working directory St
agin
g –
Lim
ited
Acc
ess
Local server
DuraCloud
Stag
ing
![Page 28: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/28.jpg)
Insights • Room for preservation metadata improvement
• Working with full metadata dumps is problematic
• Need for more automated monitoring for local storage
• Integration with CMS not helpful unless FULL integration
in other words:
• Streamlined ingest = streamlined preservation
![Page 29: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/29.jpg)
Still more thoughts
• No, really: manual management and auditing is getting less feasible
• What is acceptable content loss?
• What is acceptable preservation metadata error rate?
• Responsiveness to enhancement requests should be figured into vendor choice
• At 5 years out, PREMIS lite is just fine
![Page 30: Migrating from OCLC's Digital Archive to DuraCloud](https://reader034.vdocument.in/reader034/viewer/2022051514/54b72b0c4a79591b2d8b45d3/html5/thumbnails/30.jpg)