long-term archive and digital preservation at tacc donna harland oracle optimized solutions:...

Post on 03-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

<Insert Picture Here>

Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 2011

3

<Insert Picture Here>

CHALLENGES OF TODAY’S ARCHIVE

4

Challenges of Today’s Archive

Challenge Results Bit Rot •  Data Loss

•  Data Corruption Obsolescence •  Can no longer access the data or read the data Natural Disaster •  Data Loss Economic Failure •  Data access Loss; data loss Organizational Failure •  Data access loss; data loss, inappropriate use Information Attack •  Data corruption or loss Human Error •  Data loss or data access loss

5

Challenges of Today’s Archive

Challenge Results Lack of context • Data is available but no access or pointers or

metadata Ambiguous IP State • Copyright • Licensing

• Loss of data access

Distribution and Dissipation

• Loss of data access

Migrations and Transitions • People (2-20yrs) • Software (5-10yrs) • Hardware (3-5yrs)

• Data loss and loss of data access

6

<Insert Picture Here>

CHARACTERISTICS OF ARCHIVE SOLUTIONS

7

Availability

•  Searchable

•  Retrievable –  Dynamic access

–  What went in is comes out

•  Deliverable to new environments, in new contexts

•  Over time… a VERY long time

8

Integrity

•  Fixity of the original object –  No data loss

–  No data corruption

–  No data “augmentation”

•  Wholeness –  Contains all of its essential bits

–  Transformed content is documented

9

Authenticity

•  Assure that an object is what it purports to be…

•  Include a description of the object in its original state as well as transformations

•  Include provenance – where an object came from and the chain of custody and processes from its point of origin

10

Reusability

•  Collaboration

•  May require the object in its original form or format

•  May require a derived form, suitable for a specific purpose –  Case study: what’s more useful, an image of a newspaper

page, or the full text of a newspaper page?

•  Requires clear understanding of business purpose

11

Security

•  Secure against leakage

•  Secure against tampering

•  A primary design consideration

•  A vital element in trust

12

Sustainability

•  Technically feasible & maintainable

•  Economically viable and maintainable

•  Organizational alignment and commitment

•  Able to adapt –  Technically: changes in technology, scale, have a migration

plan that is non-disruptive

–  Economically: changes in costs, funding (recessions…)

–  Organizationally: layoffs, staff changes, mergers, strategy shifts

13

Trustworthiness

•  Perception of competence, security, long-term commitment

•  Prerequisite for confidence by –  Depositors

–  Funders

–  Content Consumers

14

<Insert Picture Here>

ARCHITECTING AN ARCHIVE SOLUTION

15

Data Archive Layers

Storage Archive

Manager

Flash Tape

Manage content

Data Preservation and

Content Management Applications

Disk

15

16

Preservation Mindset & Strategies

•  Resist the temptation to think of preserved objects as “static” –  Migrations, versions, audits & disseminations all require

constant attention

–  New access to old data, old access to new data

–  The content will not change but it’s home will

–  Awareness of retention requirements

•  Remember that preservation is a journey, not a destination

17

Technological Considerations

•  Minimize dependencies –  Encapsulate your metadata with your objects –  Storage preservation should not depend on specific storage –  Applications should not depend on specific storage

•  Minimize affect of errors –  Embrace redundancy –  Embrace diversity

18

Design

•  Don’t overspec; don’t overbuild –  Design a scalable architecture –  Build in ability to grow non-disruptively with customer demand

•  Monolithic systems don’t meet requirements –  Complex, expensive, inflexible –  Migration costs can capsize you

•  Components should not depend on each other but should be proven to work together

•  Keep it simple; have an exit plan for every component

19

Know Your Designated Community -  Who will be using the content?

-  Is there data connectivity requirements?

-  How will they be using the data?

-  Latency

-  Delivery formats

-  Security -  Offer (appropriate) access from the start -  Remain flexible as the community changes and

grows

20

Basic Architecture of an Unstructured Data Archive Solution

•  Application –  Captures Data –  Creates Content Metadata;

Optionally stored in DB –  Stores Content in a File Store –  Provides Search Engine –  Provides data preservation

features

•  Database Server –  Content Metadata –  Security –  Improved search performance

•  File Store

Application Database Server

Metadata

File Store

21

SAM QFS As The File Store

21

•  SAM-QFS –  Dynamically maintains

data on defined tiers of storage

–  Dynamically stages data for access when requested by application

–  Standard file access via FC, NFS, CIFS

Application

File Store

SAM-QFS Managed Tiered Storage

Database Server

22

Oracle Storage Appropriate for an Archive

SAM QFS File system and Metadata

•  High Speed FC Drives •  FC Access •  High Availability

FC Array Storage

S6580

S6780

S6180

High Capacity Disk Storage

S6580

S6780

S6180

7720

7420

7320 7120

Disk Archive •  SATA Drives •  FC or IP access •  High Capacity •  High Availability

Tape and Libraries

SL8500 SL3000

LTO T10K

Tape Archive •  T10KC

•  Highest capacity

•  DIV •  LTO 5

23

Oracle Enterprise Content Management

•  Content Management –  Geared toward business data and workflow –  Customizable for different data types

•  Oracle Optimized Solution •  Fully tested, integrated solution (HW, SW, Storage SW) •  Expanding into industry data –  Health Sciences –  Media and Entertainment

24

What Solutions Integrated SAM QFS? •  Third Party Applications and SAM QFS

−  Scalable On-line Archive Repository (S.O.A.R.) from Moca/Arrow and their Channel Partners (see Mark Legott preso) −  Sun tested and partner marketed −  Uses Open Source Software Drupal and Fedora −  Fully supported by Oracle partner for implementation and 1st call

−  Ex Libris −  New Zealand National Library implementation and validated solution

−  Storage Resource Broker (SRB) −  Customer implementation at DOD −  Tight integration with SAM

−  PACS Applications −  Been in production in many sites since STK was STK

−  Home-Grown-Application −  Norwegian National Library “it just works” −  6PB under SAM management (1 on disk archive 2 on tape archive)

24

25

Questions..

25

26

We encourage you to use the newly minted corporate tagline “Hardware and Software, Engineered to Work Together.” at the end of all your presentations. This message should replace any reference to our previous corporate tagline “Hardware. Software. Complete.”

top related