harvard’s digital repository service (drs) architecture
DESCRIPTION
Harvard’s Digital Repository Service (DRS) Architecture. Harvard University Library (HUL) Andrea Goethals, Randy Stern December 10, 2009. Today’s Agenda. What is the DRS? DRS 1 Architecture DRS 2 Highlights Questions. 1. What is the DRS?. DRS Context. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/1.jpg)
Harvard’s Digital Repository Service (DRS) Architecture
Harvard University Library (HUL)Andrea Goethals, Randy Stern
December 10, 2009
![Page 2: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/2.jpg)
Today’s Agenda
1. What is the DRS?2. DRS 1 Architecture3. DRS 2 Highlights4. Questions
![Page 3: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/3.jpg)
1. What is the DRS?
![Page 4: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/4.jpg)
DRS Context A core portion of HUL’s mission is to
provide current and future access to research materials and resources, with recognition that preserving access to digital content requires different strategies, tools and skills
Digital Preservation projects and activities (2000-)
Digital Preservation Program (June 2008-) Centerpiece: the Digital Repository Service
(DRS)
![Page 5: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/5.jpg)
What is the DRS? Set of professionally managed services for
preservation and access
metadata and content storage &
monitoringservice
creation & format
guidelines, training, ingest
service
delivery services,access restrictions, persistent names
preservationplanning
& activities,administration,
management tools
usecreation/acquisition
![Page 6: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/6.jpg)
What’s in the DRS?
![Page 7: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/7.jpg)
What’s in the DRS?
![Page 8: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/8.jpg)
What’s in the DRS?
![Page 9: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/9.jpg)
What’s in the DRS?
![Page 10: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/10.jpg)
What’s in the DRS?
![Page 11: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/11.jpg)
What’s in the DRS?
![Page 12: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/12.jpg)
What’s in the DRS?
![Page 13: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/13.jpg)
What’s in the DRS?
![Page 14: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/14.jpg)
DRS by the numbers 103 TB of content
335 TB total (counting all copies) 13 M files
10 M image files 21,000 audio files 2.8 M text files 851,000 compressed Google books
containing 672 M files 6,300 compressed web harvests
containing 14 M web files
![Page 15: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/15.jpg)
DRS growth
•Fueled by large projects•Recent explosion – mass digitization (Google book project)
0
20
40
60
80
100
120
Oct-00 Oct-01 Oct-02 Oct-03 Oct-04 Oct-05 Oct-06 Oct-07 Oct-08 Oct-09
TB
![Page 16: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/16.jpg)
Broadening content and metadata requirements
New formats and genres, born-digital content Email archiving, more audio, drawing, video
Descriptive metadata, linkages to catalogs Rights management, more access
restrictions Auxiliary content
Contextual material, licenses, donor agreements, collection objects, documentation, repository agents
![Page 17: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/17.jpg)
2. DRS 1 Architecture
![Page 18: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/18.jpg)
DRS System Architecture
TCP/IP
NFS
Metadata Storage
Database
DRS Web Admin Tools
Delivery ServicesIngest Services
Consistency Validation Service Content Storage Service
![Page 19: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/19.jpg)
Metadata Storage Database
DRS-1 Objects are modeled as related files
File Metadata: Administrative (owners, projects, deposit dates, owner IDs,
etc.) Technical (format mime-type & format specific data) Role, purpose, quality No descriptive metadata Access restrictions (public, Harvard-only, dark) MD5 file digest and byte count
Relationship triples “is_part_of”, “is_preservation_replacement_for”, etc. 21 relationship types ~13M files, 12.3M relationships
![Page 20: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/20.jpg)
Content Storage ServiceBit preservation
Redundancy, heterogeneity, extensibility, scalability, simple file access protocol
Access demands high availability and high performance delivery
Functional requirements: At least three copies in three physical locations Two media types Two on-line copies for high availability One near-line copy, one off-line copy
![Page 21: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/21.jpg)
Content Storage ServiceStorage provider
SUN SAM/QFS Storage Archive Manager 2 file classes: highuse and lowuse Archiving rules
High use files Copy 1 on disk at local server center Copy 2 on disk at remote server center Copy 3 on tape in library Copy 4 on tape off line at Harvard Depository
Low use files Copy 1 on disk at remote server center Copy 2 on tape in library Copy 3 on tape off line at Harvard Depository
High speed cache for access
![Page 22: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/22.jpg)
Consistency Validation Service
Continuous monitoring for file system and database consistency Crawls the file system and confirms that every
disk file has a DRS metadata record Crawls the DRS metadata records table and
confirms that every file referenced exists in the file system
Confirms that the MD5 checksum for each file is the same as recorded in the database
Reports errors to administrators
![Page 23: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/23.jpg)
Delivery and Access Services
Real time web delivery Image delivery service
JPEG, JPEG 2000, TIF, GIF Page turned object delivery service
METS + page images + page text Streaming delivery service
Real Audio File delivery service
PDFs Web Archiving Service Asynchronous delivery service
Archival masters
![Page 24: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/24.jpg)
Administrative Services DRS Web Administrator
Searching, reporting, file operations, archival master download
Page Turned Object Maintenance METS structure editor
Name Resolution Service Maintenance URN create/update/report
![Page 25: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/25.jpg)
DRS System Architecture
TCP/IP
NFS
Metadata Storage
Database
DRS Web Admin Tools
Delivery ServicesIngest Services
Consistency Validation Service Content Storage Service
![Page 26: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/26.jpg)
DRS System ArchitectureIngest Services
TCP/IP
NFS
Metadata Storage
Database
DRS Web Admin Tools
Delivery Services
Consistency Validation Service Content Storage Service
DRS Loader
SFTP Drop
Boxes
BatchBuilder
DepositorsWeb Archiving
Service
![Page 27: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/27.jpg)
DRS System ArchitectureDelivery Services
TCP/IP
NFS
Load BalancedDelivery Services
Metadata Storage
Database
DRS Web Admin Tools
Load BalancedDelivery Services
Catalogs – Web Sites - Google
Consistency Validation Service Content Storage Service
DRS Loader
SFTP Drop
Boxes
BatchBuilder
DepositorsWeb Archiving
Service
![Page 28: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/28.jpg)
DRS System ArchitecturePersistent Naming and Access Services
TCP/IP
NFS
Load BalancedDelivery Services
Metadata Storage
Database
DRS Web Admin Tools
Load BalancedDelivery Services
Catalogs – Web Sites - Google
Access Management
Service
Name Resolution Service
Consistency Validation Service Content Storage Service
DRS Loader
SFTP Drop
Boxes
BatchBuilder
DepositorsWeb Archiving
Service
![Page 29: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/29.jpg)
DRS System ArchitectureStorage Services
Disk archive (High use, copy 1)
Site 2 Boston
Site 1 Cambridge
Disk archive (High use, copy 2)
Disk archive (Low use. copy 1)
Tape archive (High use, copy 3)Tape archive (Low use, copy 2)
Media only
Tape archive (High use, copy 4)Tape archive (Low use, copy 3)
Site 3 Westborough
TCP/IP
NFS
Load BalancedDelivery Services
Metadata Storage
Database
DRS Web Admin Tools
Load BalancedDelivery Services
DRS Loader
Catalogs – Web Sites - Google
Access Management
Service
Name Resolution Service
SFTP Drop
Boxes
Consistency Validation Service
BatchBuilder
SAM/QFS
DepositorsWeb Archiving
Service
![Page 30: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/30.jpg)
Storage ServicesImplementation
Sun SAM-QFS 4.6 Rule-based automatic archiving – no “backups” Unified file name space
Dual Sun T2000 Solaris SAM servers Redundant servers at site 1, DR failover at site 2 Nightly samfsdump from site 1 - samfsrestore at site 2
EMC CLARiiON disk storage arrays RAID 1+0 FC cache/ RAID 5 SATA Disk Archives 35TB CX3-40 at site 1, 109 TB CX3-80 at site 2
StorageTek SL500 tape library LTO-4
In production since Feb 2008
![Page 31: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/31.jpg)
Storage ServicesRedundancy
Private TCP/IP
Sun T2000Solaris 10SAM-QFS
Sun T2000Solaris 10SAM-QFS
FC switch FC switch
4 GB cacheSP 4 GB cacheSP
EMC CX3-40FC / SATA, RAID 1+0 / RAID 5
Staging cacheDisk archive (High use, copy 1)
Off-site, HBSPOn-site, UIS
Sun T2000Solaris 10SAM-QFS
8 GB cacheSP 8 GB cacheSP
EMC CX3-80FC / SATA, RAID 1+0 / RAID 5
Disk archive (High use, copy 2)Disk archive (Low use. copy 1)
StorageTek SL 500LTO-4
Tape archive (High use, copy 3)Tape archive (Low use, copy 2)
Robot Drive Drive Drive Drive
Media onlyLTO-4
Tape archive (High use, copy 4)Tape archive (Low use, copy 3)
Off-site, HD
Public TCP/IP
SAMSAMSAMNFS NFS NFS
App serverWeb server
NFS HTTP
![Page 32: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/32.jpg)
Metadata Storage ServiceImplementation
DRS metadata storage Oracle 10G Live production server – copy 1 Dataguard failover copy – copy 2 Legato Tape backups – copy 3
![Page 33: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/33.jpg)
Ingest ServicesImplementation
Batch deposit of SIPs to SFTP drop boxes DRS Batch Loader operates 8AM-8PM 51 object owners – libraries, museums ~12 depositors 234 project codes Daily weekday deposits average ~60
GB/day
![Page 34: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/34.jpg)
Delivery ServicesImplementation
High availability design Redundant public access servers
Delivery, access management, name resolution Cisco Content Switch Load balancing, sticky sessions MRTG monitoring
Change control – no downtime on updates RHE linux, java 1.5, tomcat Tomcat and log4j logging and statistics
![Page 35: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/35.jpg)
3. DRS 2 Highlights
![Page 36: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/36.jpg)
Scope of work Builds on the early 2008 storage upgrade 2008-~2013 Effects every part of the DRS!
Expanded data model New and different metadata Object descriptors Content models Preservation plans Enhanced deposit tools New management applications New backend services
First major release: Summer 2011
![Page 37: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/37.jpg)
Object descriptors A METS metadata file per object on the file
system alongside content files Descriptive, administrative, preservation,
technical and structural metadata Describes the object, all its files and bitstreams
and related significant events Gives the metadata the same secure storage
as the content files Self-contained, portable objects
![Page 38: Harvard’s Digital Repository Service (DRS) Architecture](https://reader036.vdocument.in/reader036/viewer/2022081505/56815ae8550346895dc8ac8a/html5/thumbnails/38.jpg)
Some technical challenges Amount of metadata to store
Bitstream description Many elements (esp. MODS, MIX)
Efficient, scalable search implementation Database, index, combination?
Keeping metadata in sync Database, object descriptors on file system
Effect on system of continued growth Consistency checks, migrations, format analysis, etc.
HRCI requirements Email archiving