long-term archive and digital preservation at tacc donna harland oracle optimized solutions:...

26

Upload: others

Post on 03-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

1

Page 2: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

<Insert Picture Here>

Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 2011

Page 3: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

3

<Insert Picture Here>

CHALLENGES OF TODAY’S ARCHIVE

Page 4: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

4

Challenges of Today’s Archive

Challenge Results Bit Rot •  Data Loss

•  Data Corruption Obsolescence •  Can no longer access the data or read the data Natural Disaster •  Data Loss Economic Failure •  Data access Loss; data loss Organizational Failure •  Data access loss; data loss, inappropriate use Information Attack •  Data corruption or loss Human Error •  Data loss or data access loss

Page 5: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

5

Challenges of Today’s Archive

Challenge Results Lack of context • Data is available but no access or pointers or

metadata Ambiguous IP State • Copyright • Licensing

• Loss of data access

Distribution and Dissipation

• Loss of data access

Migrations and Transitions • People (2-20yrs) • Software (5-10yrs) • Hardware (3-5yrs)

• Data loss and loss of data access

Page 6: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

6

<Insert Picture Here>

CHARACTERISTICS OF ARCHIVE SOLUTIONS

Page 7: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

7

Availability

•  Searchable

•  Retrievable –  Dynamic access

–  What went in is comes out

•  Deliverable to new environments, in new contexts

•  Over time… a VERY long time

Page 8: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

8

Integrity

•  Fixity of the original object –  No data loss

–  No data corruption

–  No data “augmentation”

•  Wholeness –  Contains all of its essential bits

–  Transformed content is documented

Page 9: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

9

Authenticity

•  Assure that an object is what it purports to be…

•  Include a description of the object in its original state as well as transformations

•  Include provenance – where an object came from and the chain of custody and processes from its point of origin

Page 10: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

10

Reusability

•  Collaboration

•  May require the object in its original form or format

•  May require a derived form, suitable for a specific purpose –  Case study: what’s more useful, an image of a newspaper

page, or the full text of a newspaper page?

•  Requires clear understanding of business purpose

Page 11: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

11

Security

•  Secure against leakage

•  Secure against tampering

•  A primary design consideration

•  A vital element in trust

Page 12: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

12

Sustainability

•  Technically feasible & maintainable

•  Economically viable and maintainable

•  Organizational alignment and commitment

•  Able to adapt –  Technically: changes in technology, scale, have a migration

plan that is non-disruptive

–  Economically: changes in costs, funding (recessions…)

–  Organizationally: layoffs, staff changes, mergers, strategy shifts

Page 13: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

13

Trustworthiness

•  Perception of competence, security, long-term commitment

•  Prerequisite for confidence by –  Depositors

–  Funders

–  Content Consumers

Page 14: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

14

<Insert Picture Here>

ARCHITECTING AN ARCHIVE SOLUTION

Page 15: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

15

Data Archive Layers

Storage Archive

Manager

Flash Tape

Manage content

Data Preservation and

Content Management Applications

Disk

15

Page 16: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

16

Preservation Mindset & Strategies

•  Resist the temptation to think of preserved objects as “static” –  Migrations, versions, audits & disseminations all require

constant attention

–  New access to old data, old access to new data

–  The content will not change but it’s home will

–  Awareness of retention requirements

•  Remember that preservation is a journey, not a destination

Page 17: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

17

Technological Considerations

•  Minimize dependencies –  Encapsulate your metadata with your objects –  Storage preservation should not depend on specific storage –  Applications should not depend on specific storage

•  Minimize affect of errors –  Embrace redundancy –  Embrace diversity

Page 18: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

18

Design

•  Don’t overspec; don’t overbuild –  Design a scalable architecture –  Build in ability to grow non-disruptively with customer demand

•  Monolithic systems don’t meet requirements –  Complex, expensive, inflexible –  Migration costs can capsize you

•  Components should not depend on each other but should be proven to work together

•  Keep it simple; have an exit plan for every component

Page 19: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

19

Know Your Designated Community -  Who will be using the content?

-  Is there data connectivity requirements?

-  How will they be using the data?

-  Latency

-  Delivery formats

-  Security -  Offer (appropriate) access from the start -  Remain flexible as the community changes and

grows

Page 20: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

20

Basic Architecture of an Unstructured Data Archive Solution

•  Application –  Captures Data –  Creates Content Metadata;

Optionally stored in DB –  Stores Content in a File Store –  Provides Search Engine –  Provides data preservation

features

•  Database Server –  Content Metadata –  Security –  Improved search performance

•  File Store

Application Database Server

Metadata

File Store

Page 21: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

21

SAM QFS As The File Store

21

•  SAM-QFS –  Dynamically maintains

data on defined tiers of storage

–  Dynamically stages data for access when requested by application

–  Standard file access via FC, NFS, CIFS

Application

File Store

SAM-QFS Managed Tiered Storage

Database Server

Page 22: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

22

Oracle Storage Appropriate for an Archive

SAM QFS File system and Metadata

•  High Speed FC Drives •  FC Access •  High Availability

FC Array Storage

S6580

S6780

S6180

High Capacity Disk Storage

S6580

S6780

S6180

7720

7420

7320 7120

Disk Archive •  SATA Drives •  FC or IP access •  High Capacity •  High Availability

Tape and Libraries

SL8500 SL3000

LTO T10K

Tape Archive •  T10KC

•  Highest capacity

•  DIV •  LTO 5

Page 23: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

23

Oracle Enterprise Content Management

•  Content Management –  Geared toward business data and workflow –  Customizable for different data types

•  Oracle Optimized Solution •  Fully tested, integrated solution (HW, SW, Storage SW) •  Expanding into industry data –  Health Sciences –  Media and Entertainment

Page 24: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

24

What Solutions Integrated SAM QFS? •  Third Party Applications and SAM QFS

−  Scalable On-line Archive Repository (S.O.A.R.) from Moca/Arrow and their Channel Partners (see Mark Legott preso) −  Sun tested and partner marketed −  Uses Open Source Software Drupal and Fedora −  Fully supported by Oracle partner for implementation and 1st call

−  Ex Libris −  New Zealand National Library implementation and validated solution

−  Storage Resource Broker (SRB) −  Customer implementation at DOD −  Tight integration with SAM

−  PACS Applications −  Been in production in many sites since STK was STK

−  Home-Grown-Application −  Norwegian National Library “it just works” −  6PB under SAM management (1 on disk archive 2 on tape archive)

24

Page 25: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

25

Questions..

25

Page 26: Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 20117

26

We encourage you to use the newly minted corporate tagline “Hardware and Software, Engineered to Work Together.” at the end of all your presentations. This message should replace any reference to our previous corporate tagline “Hardware. Software. Complete.”