library of congress storage environment...library of congress storage environment update 2018 carl...
TRANSCRIPT
September 17, 2018
Library of Congress Storage EnvironmentUpdate 2018
Carl WattsInformation Technology SpecialistIT Services Operations / Operations and Maintenance / Unix Systems
1September 2018
Converged Storage Tiers
2September 2018
Content Storage
3September 2018
Content is equal to single copy of a digital object and it’s associated derivative(s)
Preservation Copies (currently) Standard Collections – two (2) copies distributed across two (2) datacenters
Special Collections – two (2) different platforms holding two (2) copies distributed across two (2) datacenters
Presentation Copies Currently single online copy
Near future – two (2) copies across (2) datacenters
Future – multiple copies across datacenters and “cloud” providers
Content Growth – Preservation
September 2018 4
Unique File Count:410M Total Files
1,447.43
3,061.85
2,709.44
1,906.93
2014 2015 2016 2017 2018
Annual Growth (in TB)
6,856.40
8,303.83
11,365.67
14,075.11
15,982.04
2014 2015 2016 2017 2018
Longterm Storage (single copy in TB)
Content Growth – Preservation
September 2018 5
14,286
21,438
25,032
28,085
34,361
2014 2015 2016 2017 2018
OVERALL LONG-TERM STORAGE GROWTH (ALL COPIES IN TB)
Content Growth – Presentation
September 2018 6
236.40
546.60
1,095.10
1,620.70
2,086.30
2,572.46
2013 2014 2015 2016 2017 2018
Access Storage (in TB)
2013 2014 2015 2016 2017 2018
236.40
310.20
548.50525.60
465.60
504.19
2013 2014 2015 2016 2017 2018
Annual Terabyte Growth
Annual Terabyte Growth
Unique File Count:344M Total Files
Migrations Continue
7September 2018
Consolidating Preservation Storage Combining resource to reduce cost
Migrating Data Centers Completed migration of presentation storage to new location and system
Preparing to replicate data to new data center (2019)
Migrating Tape Technology Preparing to migrate IBM TS1140 tape to TS1155 tape (2019)
8September 2018
Quad ‘P’ Dataflow (Current)
Procure Preserve Process Present
Wor
kflo
w E
ngin
e(s)
esubmit.loc.gov(external push)
Media Exchange(external push)
Signiant Workflow(internal pull)
Media Shuttle(push/pull)
CTS via ingest servers
Fetcher(internal pull)
Transitory Storage
Pool
Transitory StoragePools
External Client
Transitory StoragePools
Transitory StoragePools
Delivered Content
(portable HD)
Transitory Storage
Pool
Transitory StoragePools
Client
Client
sFTP
Web Site
In House Digitization
Processing VM
Transitory StoragePools
Client
CTS Workflow Engine
Signiant Manager
Oracle HSM
IBM LTFS(Special Collections)
CTS VMsProcessing
StoragePools
CTS Scheduler
Processing VMs
Online Content Storage(xSTOR)
CDN
WebArchive
ChronAm
Web Server(s)
Web Server(s)
Web Server(s)
Web Server(s)
Other
DMS Workflow
PCWA
House Video Encoders
Transitory StoragePools
House Recording Studio
Looking to add Content Abstraction Layer
9September 2018
Content Abstraction Layer (CAL) would provide: Manage the procurement of data from multiple sources Manage the preservation of content:
File fixity checking File validation / usability
Manage the automation of content processing Manage the movement / orchestration of data across multiple
Systems Data centers Cloud providers External entities
Provide a persistent namespace and access method to data
10September 2018
Quad ‘P’ Dataflow (Proposed)
Procure Preserve Process Present System Backup
Wo
rkfl
ow
En
gin
e(s
)
esubmit.loc.gov(external push)
Media Shuttle(push/pull)
CTS via ingest servers
Fetcher(internal pull)
Transitory Storage
Pool
Transitory StoragePools
Transitory StoragePools
Delivered Content
(portable HD)
Transitory StoragePools
Client
sFTP
Web Site
In House Digitization
Processing VM
Transitory StoragePools
Client
On-Prem Object Storage(Storage-as-a-Service)
Processing StoragePools
Processing VMs
CDN
Web Capture
ChronAmer.
Web Server(s)
Web Server(s)
Web Server(s)
Web Server(s)
Other
DMS Workflow
PCWA
House Video Encoders Transitory
StoragePools
House Recording Studio
Content Abstraction Layer
Long-term Storage(Large File and Special Collections)
Tape Tech
Off-Site Cloud Storage(DC5)
[AWS, Azure, Google, other…)
Off-Site Cold Cloud Storage(DC5)
[AWS, Azure, Google, other…)
Policy Management
Object Discovery & Classification
Quota Management
Storage Analytics
Public Datasets Cloud Storage
(DC5)[AWS, Azure, Google, other…)
Shared Datasets [Agency, Academia,
other...)
Object Audit
Workflow Engine
Data Tiering
sFTP
NFS S3
SM
B/C
IFS
HTTP
S
REST
Data Validation
and Verification
eCO NAS
eCO Submitter
Server
VMs
DB
BackupServer
11September 2018
Data Center 1 StorageData Center 2 Storage
Data Center 3 Storage Data Center 4 Storage
DC5
Cloud Provider A
DC5
Cloud Provider BDC5
Cloud Provide ...
Web Services EnvironmentBack-up Environment
Preservation Systems
Procurement Systems
Processing Systems
Content Abstraction Layer
Thank you
12September 2018