digital repository preservation service ________________________ lmc plus april 16, 2008 meg...

28
Digital Repository Preservation Service ______________________ __ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Digital Repository Preservation Service

________________________LMC Plus

April 16, 2008

Meg Bellinger, AUL

Roy Lechich, Audrey Novak, ILTS

Page 2: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Yale Cyber Infrastructure Architecture

Common Services

Persistent identification, Authentication & Authorization, Registries, Rights Management

Content Provision: Services & Storage

For digital collections, preservation, metadata

From library, museums, research, academic and administrative departments

Users

Yale and global

Fusion:

Services, Tools, Applications

Brokers, aggregators, indexes, catalogs, MetaLib, XSearch

Infrastructure Framework and Protocols

Web services, Z39.50, OAI-PMH, RSS, SRU/SRW, OAIS, Fedora

Presentation: Interfaces

Yale uPortal, Classesv2, Google, Personal Information Environment, Discipline specific, gallery, museum and library sites

Based on a graphic created by Lorcan Dempsey

Page 3: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

PreservationArchive

E-Publishing(InstitutionalRepository)

CollectionsEnvironment

Integration Services

Content Sources

Dissem-ination

Full TextBooks

Audio& Video

Images and Metadata

ComplexObjects

Research Data

FindingAids

PersonalCollections

Google, MSN, Yahoo …

Image Commons

UniversityPortal

Library, MetaLib

CollectionsXSearch

VITAL

Classes*v2(Sakai)

.

Inte

rfac

e o

ut

Yale University Library

Digital Repository Service

Content

Metadata

Page 4: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Outline ______________________________________

16 Apr 2008

• Introduction• Background• Digital Preservation Repository

– Phase I– Additional Phases

• Within the Larger Landscape

Page 5: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

“ Digital preservation is the whole of the activities and processes involved in the physical and intellectual protection and technical stabilization of digital resources through time in order to reproduce authentic copies of these resources.” (YUL Digital Preservation Policy)

Intro: What is Digital Preservation?__________________________________________________

Page 6: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Introduction: The Need___________________________________________________________________

At an ever accelerating pace, faculty, students, and staff (e.g., the Library) are creating, sharing, and storing digital information for teaching, learning, research, administrative, and creative purposes.

Mass Digitization

Information in digital form is now integral to Yale's core mission.

Statistical Datasets

Images

Scientific & Biomedical Data

Audio, Video, Podcasts

Web Sites

Page 7: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Introduction: The Need__________________________________________________

• Digital resources are fragile and the preservation of these resources is complex.

• Digital preservation is dynamic – Responses to technological obsolescence or media decay

must be taken quickly.

• Digital preservation is pro-active – Rather than reactionary and the prospects for successfully

preserving digital resources rest heavily upon decisions taken at each stage of their life cycle starting with creation.

Page 8: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Digital Landscapes Committee, Cyberinfrastructure Survey (Oct 2006)

Ranking from 19 survey questions posed to faculty:

#1 Easier electronic access to scholarly materials 

#2 Providing students with digital access to research and instructional materials 

#11 Ensuring the preservation of my scholarly digital 

output (e.g., datasets, research notes, e-prints) 

Introduction: The Need_____________________________________________________

Page 9: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Introduction: The Need_____________________________________________________

“The coolest thing that will be done with your data someone else will do.” Open Repositories 08

Page 10: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Background – YUL Related Initiatives_____________________________________________________

16 Apr 2008

• IAC Rescue Repository – 2004 - present

• IAC Digital Preservation Committee – Nov 2004 - Jan 2007

• IAC Metadata Committee – Nov 2004 - Feb 2007– PREMIS - Preservation Metadata Task Force

• April - Oct 2006

Page 11: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

“An increasing number of projects in the YUL are generating or acquiring digital content …”

“The digital masters for much of this material are in immediate danger of permanent loss through media decay, physical damage, technological obsolescence, or difficulties in archival management..."

"...in the interim, we propose a flexible and agile/quick short-term solution…"

Rescue Repository (May 2004 Requirements Report)

______________________________________

Page 12: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

• Managed, secure storage (disk-to-disk-to-tape).

• Resources are organized according to owning library, collection, subcollection(s), file name.

• Activity is managed by simple ingest and retrieval applications with basic file verification and validation.

• A ~3 year temporary solution (May 2005 +3 yrs).

• Heavily used …

Resue Repository Description _____________________________________________________

Page 13: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

RR Storage Usage

0

10

20

30

40

50

60

Date

Sto

rag

e in

TB

Total Storage

Used Storage

Available Storage

Total Storage 13.6 13.6 13.6 13.6 13.6 13.6 28.8 36 36

Used Storage 0.419 0.698 5.7 8.4 9.3 9.9 14.1 19 36 43.5 53

Available Storage 13.1 13 8 5.3 4.5 3.8 14.7 21 0

Oct-05 Jan-06 Jul-06 Jan-07 Mar-07 Jun-07 Oct-07 Nov-07 Jun-08 Sep-08 Dec-08

16 Apr 2008

Users: BRBL, Div, E-Collections, Geo, LWL, MSS/A, Peabody, Preservation, SSL, VRC, YUAG

Page 14: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

• Preservation Policy – Defines digital preservation; establishes general principles about what is preserved; promulgates our commitment to standards.

• Best Practices – A dynamic suite of documents that address current best practices for preservation-related issues such as format validation, registries, etc.

Digital Preservation Committee ___________________________________________________________________

Page 15: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Metadata Committee ____________________________________________

Preservation Metadata Taskforce (PREMIS) Report

• PREMIS (PREservation Metadata Implementation Strategies) defines the metadata needed to preserve digital information assets for the long term.

Page 16: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Two Profiles for YUL’s PREMIS Implementation:

• Base (6 elements) - A sub-set of full PREMIS … that is temporary until the library has developed digital preservation policies.

• Full - A draft that needs to be fine-tuned through experience with actual instances of use at Yale. Experience using PREMIS will determine which elements in the PREMIS model are necessary at Yale.

Preservation Metadata TaskForce Recommendations

__________________________________________________

Page 17: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Digital Preservation Need and Related Initiatives Summary

_____________________________________________________

• The demand for a Digital Preservation Repository from faculty, Rescue Repository users, digitization operations and projects is heavy.

• The Rescue Repository and work by the IAC Digital Preservation and Metadata/PREMIS Committees laid the foundation.

• Rescue Repository is reaching its planned end to life.

16 Apr 2008

Page 18: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Digital Preservation Repository: Phase I _________________________________________________________

16 Apr 2008

• $500,000 funding to establish a Digital Preservation Repository prototype.

– Provide mechanisms and services for preservation and access to the data.

– Create the scalable hardware infrastructure. – Demonstrate an extensible repository service

model.– Develop the resource (staff and economic) models.– Establish the collaborative campus partnerships. – Further the research and scholarship into digital

preservation issues.

Page 19: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Digital Preservation Repository: Phase I _________________________________________________________

16 Apr 2008

Working from two Use Cases:1. YPED (Yale Protein Expression Database)*

• Protein profiling mass spectrometry data sets generated by the Keck Lab

2. Images from the Rescue Repository• Approximately 400,000 individual image files

from the Art Gallery, Beinecke, Divinity Library, Lewis Walpole Library, Library Visual Resources Collection, and Manuscripts and Archives department.

* Proteomics is the large-scale study of proteins and is often considered the next step in the study of biological systems, after genomics.

Page 20: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Digital Preservation Repository: Phase I _________________________________________________________

16 Apr 2008

1. Hardware Architecture

2. Software Design

3. Preservation Metadata

4. Use Case: YPED

5. Use Case: Images

Page 21: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Phase I - Hardware ______________________________________

16 Apr 2008

• 20TB YPED and Images • 30TB Microsoft mass digitization• 10TB non-images (Rescue Repository)• 40TB Annual growth with Library digitization

projects_________

• 250TB Annual growth with Fortunoff video digitization project

• 1000TBs (a petabyte) within 5 years

Page 22: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Phase I - Hardware ______________________________________

16 Apr 2008

Projected Growth in Storage

0

100

200

300

400

500

600

700

800

900

2005 2006 2007 2008 2009 2010 2011 2012

TB

Page 23: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Software Design___________________________________________________

16 Apr 2008

Phase I - Core Preservation Functionality • Deposit, Normalization, Packaging, Validation, Ingest, Storage (multiple copies, geographic separation), Preservation Policy Management, Authorization, OAI-PMH, SRW/SRU, Retrieval• YPED and Image Use Case Requirements

Additional Phases - Additional Services • Preservation actions• All (or almost all) user-facing services• Enhanced access & delivery through applications

Page 24: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Repository

SIP AIP DIP

Deposit / Ingest Preservation / Storage Access

FlexibleAccept Different Types of DataCollect Data and Metadata ComponentsNormalize for Ingest ProcessingVerify IntegrityAdd IdentifiersAdd Preservation Metadata

Continuous Integrity ChecksFormat Migrations (e.g. .tiff to .jp2000)Storage Migrations (to new or different type physical media)LoggingReporting

AuthorizationValidationOAI-PMHSRW/SRUIndexingRetrievalLogging

Page 25: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Build:• Hardware environment• Core preservation repository services • Project specific service components

needed for YPED and to replace Rescue Repository

• Migration of Rescue Repository image content

16 Apr 2008

Digital Preservation Repository – Phase ISummary

_____________________________________________________

Page 26: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

16 Apr 2008

Additional Phases _____________________________________________________

Examples:• Full Rescue Repository migration• More content (project/use cases)

– Project specific ingest and access

• More storage (950TBs)• Preservation actions (integrity checks, format

migrations, etc.)• Reporting• Rights Management

5 years, 6FTE, ~7 million dollars

Page 27: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Peer Institutions:• Stanford, Harvard• Rutgers• DAITSS (Florida)• Michigan• ColumbiaInternationally:• European National Libraries• Australia & New Zealand

16 Apr 2008

Larger Landscape ____________________________________________

Page 28: Digital Repository Preservation Service ________________________ LMC Plus April 16, 2008 Meg Bellinger, AUL Roy Lechich, Audrey Novak, ILTS

Thank you

Q&A

16 Apr 2008