Discoverable, Available, Accessible: Preserving
Digital Content
September 14, 2011
Speakers: Amy Kirchhoff, Ido Peled, and Marie-Elise Waltz
http://www.niso.org/news/events/2011/nisowebinars/preservation/
Preservation Standards (& Specifications)
(&& Best Practices)
Discoverable, Available, Accessible: Preserving Digital ContentNISO Webinar
By Amy Kirchhoff
Archive Service Product Manager, Portico, JSTOR
September 14, 2011
Speaker: Amy Kirchhoff3
Amy Kirchhoff
Archive Service Product Manager
Portico, JSTOR
609-986-2218
Portico - Third Party Preservation
Working with libraries, publishers, and funders, we preserve e-journals, e-books, and other
electronic scholarly content to ensure researchers and students will have access to it in the future.
Portico is among the largest community-supported digital archives in the world.
Speaker: Amy Kirchhoff
» E-journal titles 9,190
Preserved Content
» E-book titles 12,733
» D-collections 12
» E-journal files 223,993,405
» E-book files 869,888
» D-collection files 83,178,138
» Total Archive 308,729,560
Portico – Preserved Content
Speaker: Amy Kirchhoff
Standards are Great: Everyone Should Have One!
Speaker: Amy Kirchhoff
Speaker: Amy Kirchhoff7
20 Minutes on Standards
Speaker: Amy Kirchhoff8
Speaker: Amy Kirchhoff9
Speaker: Amy Kirchhoff10
Standards Portico Uses
Speaker: Amy Kirchhoff11
Context: Digital Preseravtion
Digital preservation is the series of management policies and activities necessary to ensure the enduring usability, authenticity, discoverability, and accessibility of content over the very long-term. The key goals of digital preservation include:
Context: Content
Speaker: Amy Kirchhoff12
13
Context: Preservation Activities
Users
Content Receipt
automated… or …manual
Processing and Reprocessing
Archive Delivery
Local BackupCloud Storage
Backup
deposit & update
export for reprocessing
query & deliver
Speaker: Amy Kirchhoff14
Standards & Specifications
Speaker: Amy Kirchhoff15
Standards & Specifications
Standards & Specifications: Framework
16
An Open Archival Information System
Consumer
Management
Ingest Access
Archival Storage
Preservation Planning
SIP
AIP AIP
Descriptive Information
Descriptive Information
DIP
Participates In Participates In
Participates InParticipates In
Producer
QueriesResult Sets
Orders
Data Management
Administration
Speaker: Amy Kirchhoff17
Standards & Specifications
Speaker: Amy Kirchhoff18
Standards & Specifications: Certification
Speaker: Amy Kirchhoff19
Standards & Specifications
20
Standards & Specifications: Transfer
Speaker: Amy Kirchhoff21
Standards & Specifications
Speaker: Amy Kirchhoff22
23
· These TIF files are page images.
· The TIF file named XYZ is page 1. It is a valid TIF and has a checksum of 123456.
· The TIF file named ABC is page 2. It is not a valid TIF and has a checksum of 78910.
...
· These JPG files are figures.· The JPG file named MNO is
the 2nd figure on page 2. It is a valid JPG and has a checksum of 234567.
...
· This PDF file contains page images.
· The page images are built from TIF files XYZ, ABC, etc. and JPG figure graphics MNO, etc.
...
· This XML file contains the full-text of the book.
· It uses the QRS DTD.· It is named JKL and has a
checksum of 555555....
· This MARC file is the bibliographic record for the book.
...
· The intellectual unit represented by this metadata file is a digitized book.
· It was scanned by Joe on this date.
· It was ingested into the repository on this other date.
· Jane Smith granted us preservation rights to it on this other date.
...
Preservation and Packaging
Metadata File
24
Standards & Specifications: Packaging
Speaker: Amy Kirchoff
Speaker: Amy Kirchhoff25
Standards & Specifications
26
Standards & Specifications: Preservation Metadata
Speaker: Amy Kirchhoff27
Standards & Specifications
28
Standards & Specifications: Format Tech MD
Speaker: Amy Kirchhoff29
Standards & Specifications
Speaker: Amy Kirchhoff30
» Contributor» Coverage» Creator» Date» Description» Format» Identifier» Language» Publisher» Relation» Rights» Source» Subject» Title» Type
Standards & Specifications: Descriptive MD
Speaker: Amy Kirchhoff31
Standards & Specifications
32
Standards & Specifications: File Formats
Speaker: Amy Kirchhoff33
Standards & Specifications
Speaker: Amy Kirchhoff34
Standards & Specifications: Identifiers
Speaker: Amy Kirchhoff35
Resources
Speaker: Amy Kirchhoff36
38
Ido PeledEx Libris Rosetta Product Manager
Rosetta,A Digital Preservation Solution
September 14, 2011
40
Agenda
Building a Digital Preservation System2
Ex Libris Rosetta3
1 Transition to a Digital Library
41
Transition to a Digital Library
Speaker: Ido Peled
42
Role of the Library
“The role of libraries in the print world - collecting, providing access to, and preserving our cultural heritage – does not change when we move to the digital realm. We must take on new challenges in determining how to fulfill this role”
Jay Schafer,
Director of Libraries at UMass Amherst
Speaker: Ido Peled
43
Traditional Library Services
Collect Print Materials
Preserve Physical Content
Provide Discovery and
Delivery
Catalog Metadata
Allow online, offsite, on
demand access
Ensure long-term access
Describe, select and assign copy and
access rights
Collect diversedigital collections
Speaker: Ido Peled
44
Digital Library Services
Collect Print Materials
Preserve Physical Content
Provide Discovery and
Delivery
Catalog Metadata
Allow online, offsite, on
demand access
Ensure long-term access
Describe, select and assign copy and
access rights
Collect diversedigital collections
Digitized and Digitally Born
Digital
Speaker: Ido Peled
45
Digital Preservation as Part of the Digital Library
Bit Preservation
MigrationRisk Analysis
Online Access
Ongoing Actions
Sustainable Digital
Preservation
Cost Effective
Speaker: Ido Peled
46
Digital Preservation vs. Digital Repository
Digital Preservation
• Focus on data integrity and long-term access
• Complete risk analysis, preservation planning and actions component
• On-going actions
Digital Repository
• Focus on access
• Strong cataloging
• Usually integrated resource discovery
• Single process
Speaker: Ido Peled
47
Building a Digital Preservation System
Speaker: Ido Peled
48
Consolidate all digital collections
CONSOLIDATE OPTIMIZE
Optimize through automation
EXTEND
Extend the library offering and
collaborations
The Strategy – Sustainable Digital Preservation
Speaker: Ido Peled
49
Consolidate all digital collections
CONSOLIDATE OPTIMIZE
Optimize through automation
EXTEND
Extend the library offering and
collaborations
The Strategy – Sustainable Digital Preservation
Speaker: Ido Peled
50
Consolidate
Faculty Publications
Data Sets
Student Publications
Institutional Photos
Archives
Newsletters
Research Paper
Alumni
Unified Digital System
Speaker: Ido Peled
51
Consolidate all digital collections
CONSOLIDATE OPTIMIZE
Optimize through automation
EXTEND
Extend the library offering and
collaborations
The Strategy – Sustainable Digital Preservation
Speaker: Ido Peled
52
Optimize
“As the rate of digital information production continues to escalate, it is vitally important to reduce the cost of preservation for all types of digital assets”
Sustainable Economics for a Digital Planet, Final Report of the Blue Ribbon Task Force
x75
x1.5
DigitalInformation
Staff
Taken from ‘The 2011 IDC Digital Universe Study’
Speaker: Ido Peled
53
Optimize - LIFE Project
• Collaboration between British Library and UCL • Developed a generic lifecycle costing formula• See http://www.life.ac.uk/
Speaker: Ido Peled
54
Optimize Ingest
Automated ingest workflows & quality assurance process
Speaker: Ido Peled
55
Optimize Ingest
Scalable Infrastructure supporting mass-ingest and mass-processing
Speaker: Ido Peled
56
Optimize Metadata Creation
Integration with existing cataloging systems
Other ILS
Speaker: Ido Peled
57
Automated Processes
Optimize Metadata Creation
Permanent Storage
Technical Metadata Extraction
Metadata Form
Ingest
ArchiveHarvestDescriptive Metadata
Speaker: Ido Peled
58
Optimize Bit Preservation
Automated scheduled fixity checks, storage abstraction layer, full-replication on disk
Speaker: Ido Peled
59
Optimize Content Preservation
Speaker: Ido Peled
60
Optimize Content Preservation
Automated migration action
Execute
Evaluate
Identify
PermanentStorage
OperationalStorage
MigrationAction
……
Speaker: Ido Peled
61
Optimize Access
Automated rule-based distribution
…
Speaker: Ido Peled
62
Consolidate all digital collections
CONSOLIDATE OPTIMIZE
Optimize through automation
EXTEND
Extend the library offering and
collaborations
The Strategy – Sustainable Digital Preservation
Speaker: Ido Peled
63
Extend Collaborative Work
Special Collections
SharedInfrastructure
UniqueWorkflows
Customizations
Speaker: Ido Peled
64
Library
Digital Preservation
Infrastructure
Extend The Library Reach
Students
Finance
Alumni
Other Stakeholders
Administration
Faculty
Speaker: Ido Peled
65
Consolidate all digital collections
CONSOLIDATE OPTIMIZE
Optimize through automation
EXTEND
Extend the library offering and
collaborations
The Strategy – Sustainable Digital Preservation
Speaker: Ido Peled
66
Library
Digital Preservation
Infrastructure
Extend The Library Reach
Students
Finance
Alumni
Other Stakeholders
Administration
Faculty
Speaker: Ido Peled
67
Ex Libris Rosetta
Speaker: Ido Peled
68
Ex Libris Rosetta
Ex Libris Rosetta helps institutions in collecting, managing,
archiving and preserving their digital collections, ensuring its
data integrity and access over time
CollectCollectManage
Archive
Manage
ArchiveDeliverDeliver
PreservePreserve
Speaker: Ido Peled
69
Our Customers Around the World
Major ReligiousInstitution
Speaker: Ido Peled
70
Rosetta Customers
Background
Background Key Areas of Collaboration
Binghamton, NY, USA
Part of the SUNY system
FTE: ~14K students
Staff: 1.5FTE (not
dedicated)
Munich, Germany
Service providers for
Bavaria
Part of the Google
Books project
Special collections
(Edwin A. Link
collection)
Born digital
newsletters
University
photographs
Scanned manuscripts
and rare books
Legal deposit
documents
Websites
Speaker: Ido Peled
Collections in Rosetta
71
Rosetta Customers
Background
Background
Collections in Rosetta
Key Areas of Collaboration
Zurich, Switzerland
Leading technological
institution
DataCite partners
Wellington, New
Zealand
Development partner
Mandate for digital
preservation
Research data
Special collections
Dissertations
Nation’s Cultural
heritage
Private collections
Websites
Speaker: Ido Peled
72
OAIS Model
Speaker: Ido Peled
73
Ingest
Preservation
Management
WorkingArea
OperationalRepository
Permanent Repository
Manual / Automatic
PublishingSIP
AIP
DIP
DeliveryDIP
SearchTools
( )
Rosetta Modules
Speaker: Ido Peled
74
Key Features
Scalable
International Community
Extendable, IntegrativeOpen Access
Complete Preservation
Solution
Speaker: Ido Peled
75
>30%
Why Should You Choose to Rosetta?
DIV
Demonstrated Institutional Value
ROI
Return OnInvestment
TCO
Total Cost of Ownership
Efficiency Effectiveness Value
• Ensure online access
• Improved distribution mechanism
• Optimized digital content management
• Confirm to funding agencies requirements
• Extended cross-institution digital preservation service
• Increased collaboration with researchers
• Enhanced emphasis on strategic initiatives
• Sustainable model for digital preservation
• Single enterprise solution
• Minimal staff requirements
Speaker: Ido Peled
76
Complying with grant proposal Preserving research data Increasing re-use of data
Rosetta as a strategic initiative (TDR) Extending the library offering Reducing TCO Offering new career paths
Digital Initiatives Librarian
Focus on curation, not IT Consolidate work efforts Extend digital collections Modern working environment
Become a Trusted Digital Library
Head of Research
Library & ITDirectors
Speaker: Ido Peled
Meeting the Expectations of the Community : CRL & the Auditing of Digital Repositories
Marie WaltzCenter for Research Libraries
What is the Center for Research Libraries (CRL)
• A consortium of over 260 College and University Library’s, primarily in the U.S. and Canada.
• Our members have an interest in auditing and certification because they are investing in digital repositories.
11/28/2007Speaker: Marie WaltzPractices & Challenges in Preservation
Research libraries are changing...content is no longer on library shelves
•The collections of 80% of U.S. research libraries are duplicating the contents of other research libraries. Most of what is owned will be digitized within the next ten years.•Google has digitized more than 12 million volumes.•Born digital material is now the norm for many types of academic materials (course syllabi, articles, and many manuscripts submitted to publishers.)
Speaker: Marie WaltzPractices & Challenges in Preservation
CRL & Digital Preservation
2002 - 2004 – Political Communications Web Archiving Project
2005 - 2006 – Test audits of TRAC
2008 - 2010 – NSF Case Studies
Speaker: Marie WaltzPractices & Challenges in Preservation
CRL & Test Audits using TRAC
RLG/CRL Mellon Foundation Project to audits digital repositories – ICPSR, Portico, KB, LOCKSS –Tested the TRAC metrics–Understanding of the auditing process.
Speaker: Marie WaltzPractices & Challenges in Preservation
CRL & NSF Case Studies
• Looked at eight organizations of various sizes who house digital content
• Gave us a broader understanding of what makes a “successful” organization
• Allowed us to see how technology decisions effect a repository.
Speaker: Marie WaltzPractices & Challenges in Preservation
Current Digital Preservation Projects at CRL
• Certifying digital repositories of interest to members.
• Participation in establishment of ISO 16363
• Human Rights Archives & Documentation.
Speaker: Marie WaltzPractices & Challenges in Preservation
What is a Trusted Digital Repository?
A trusted digital repository is one whose mission is to provide reliable, long-term access to managed digital resources to its designated community, now and in the future – Trusted Digital Repositories : Attributes and Responsibilities, An RLG-OCLC Report (RLG, 2002)
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Why Auditing?
An audit establishes the soundness and dependability of a repository.
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Auditing a Digital Repository
• Advisory Panel• Audit Criteria• Time table• Standards in auditing• Certification and reporting to community
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Current Advisory Panel
• Martha Brogan (Chair)Director of Collection Development & ManagementUniversity of Pennsylvania
• Winston AtkinsPreservation OfficerDuke University
• William ParodSenior Repository DeveloperNorthwestern University Libraries
• Mark Phillips Assistant Dean for Digital Libraries University of North Texas Libraries
• Anne PottierAssociate University LibrarianMcMaster University
• Oya Y. RiegerAssociate University Librarian for Information TechnologiesCornell University
• Perry WillettDigital Preservation Services ManagerCalifornia Digital Library
Speaker: Marie WaltzPractices & Challenges in Preservation
Criteria
• Advisory Panel• ISO 16363 (TDR) / TRAC• Community feedback
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Timetable for an Audit
1. Logistics for the audit
2. Request for Documentation
3. Identify activities, policies etc. of key significance for preservation
4. Evaluation of Repository (includes site visit)
5. Review findings with panel
6. Report findings
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Standards Used in Auditing
• Metadata Standards, Dublin Core, Content Standard for Digital Geospatial Metadata (CSDGM)
• Technical Standards, ISO 27001• Data and format standards, PDF/A, etc.
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Certification
• Means the Repository is “Trusted”• We report findings to the Research
Community• Considerations for: Length of Certification
and Re-Certification
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Certification of HathiTrust & Portico
• More information about preservation on their websites
• Assurance they are adhering to standards and preservation strategies
• Assurance that changes will take preservation into account
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
ISO Standard 16363
• Work on the standard: PTAB Group• Test audits • Future plans for auditing using ISO 16363
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Test Audits of ISO 16363
• May-July 2011. Tested six digital repositories, three in the U.S. and three in Europe.
• Future plans for auditing using ISO 16363.–European Framework for Certification
and Auditing.
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Future Plans for CRL Auditing
• Use ISO 16363• Audit repositories of interest to academic
and independent researchers in the United States and Canada.
• Encourage community feedback• Become a resource for information about
digital repositories
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Community involvement
• Without community feedback we will not be successful at targeting our audits
• We want to audit repositories of interest to academic and independent researchers.
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
CRL is becoming a Resource for Information about Digital Repositories
• Global Resources Forum (GRF) Reviews and Profiles
• Audit reports • Webinars
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Summary
CRL will continue to audit digital repositories of interest on behalf of our members using ISO 16363.
NISO Presentation
Speaker: Marie WaltzPractices & Challenges in Preservation
Thank you!
Marie Waltz [email protected]