digital preservation: rethinking system data models for … · ingest tools access standards...
TRANSCRIPT
Digital Preservation: Rethinking System Data Models for Information StructuresAndrew FrenchDirector of Solutions Architecture
May 20, 2020
What’s the problem?
• Preservation and curation is manual
– Need to remember to do it
– Takes time and effort
– Little Automation…
• Needs expert input
– Little sharing of good practice
– No advice on what to do and when to do it
– Lots of reinventing the wheel
– No cross product collaboration
– Cumbersome updates of information
What’s the other problem?
• Cardboard boxes are great for paper archives
– Preservica V1-V5 modelled the Deliverable Unit
– Suitable for Manual Active Preservation
– Inefficient for Automation…
– Our users told us to change
• Needs to support the beginner and the expert
– Simple as “Folders and Assets”…
But…
– More Granularity than PREMIS
How do we enable automation?
• Enable Information and actions to be at the Asset level
• New users want a simpler paradigm
• Existing users voted to remove DU concept
• Advanced Users want PREMIS (or better)
v5
DU
Manifestation
Collection
DU Component
Manifestation FileFile
Bitstream
Windows
Directory/Folder
File
PREMIS
Intellectual Entity
Representation
File
Bitstream
v6
Folder
Asset
Structural Object
Information Object
Content Object
Generation
Bitstream
Representation
Fldr
Preservica v6.0
Asset The logical piece of information the person deals with e.g. document, image, video, email, book
Generation The same logical content migrated to new formats for preservation reasons
Content Object block parts of content that make up a whole Representation e.g. attachments in an email
Information Object
Representation A view of that asset – the same information shown in a different way – e.g. preservation-master, access…
FileStream / BitStream The physical 0s and 1s that make up the generation of the content object
Representation Representation
Content ObjectContent Object
Generation.jp2Generation.jpg
0100111001
0100111001
Bitstream Bitstream
More on Assets
Reflects changes to the Data Model
Explorer view
Reflects changes to the Data Model
Access Representation
Reflects changes to the Data Model
Access Generation
Preservica User Survey - Feb 2020Q7 What single current capability/feature/service do you most like about Preservica and why?
Answered: 49 Skipped: 6
0 1 2 3 4 5 6 7 8 9
Fixity
Scalability
Metadata
Reporting
Versatility
Custom Workflows
Storage Adapters
Easy to Use
Data Model
End to End
Catalog Sync
Custom Service
Ingest Tools
Access
Standards
Migrations
Comprehensive Preservation Actions
Secure Storage
API availability
API DescriptionCloud
Edition Essentials
Cloud Edition
Professional
Enterprise Private Cloud
Enterprise Private Cloud
Perform
Enterprise on Premise
Deployment type CloudOn
Premise
Tenancy type Shared-Instance Enterprise Specific Instance
OAI-PMH Open, standard metadata harvesting
CMIS Standard for content management system
Content Preservica Content Access
Entity (Content Read & Write) To retrieve and update information
Access Token Managing user authentication
S3 Simple Storage Service compatible API
Workflow (Control of Tasks & Admin) Programmable control/monitoring
Progress Token monitoring of long-running tasks
PAR (Preservation Actions Registry) Retrieving and updating business rules,
SIP Creator Command Line Control tasks and administration
Note: Transaction limit applies for CE Pro
developers.preservica.com
▪ Browser-based
▪ Drag & drop interface
▪ Support for large file uploads
▪ Pre-ingest preparation area
Preparation and Upload
#ourcovid19story
As a community we are in a unique position to capture and preserve the story of this pandemic for future generations
Preservica would like to support you in capturing the unique stories of your institution and communities
Preservica includes all the tools needed for automatic web page harvesting, preservation, and rendering
Take control of critical long-term content across the enterprise
▪ Active digital preservation▪ Provenance – fixity and
audit trails▪ Mimic hierarchy and
metadata structure▪ Advanced security▪ Automated records
disposition
Information managers
Content consumers
Intelligent Archiving Engine™
Secure Access Portal
Content contributors
Who benefits from Automated Format Preservation?
Existing Users
• Help you make the more appropriate DP decisions and allow actions to be automated• Make sure your content is in step with your policy• Because you asked for it at the User Group
New Users
• Allow non-expert users access to DP functionality using automated actions• Leverage community best practice
Wider Community
• Share DP best practice with the whole community so everyone learns• Benefit from the PAR framework
Online documentation and “try it out”
Tweet : multi-part (API response plus image)
3D – multi-part geometry, material, texture
3D – displayed as a single object
GIS display using backgrounds
Active Digital Preservation
APIsAdvanced
security and administration
Storage and deployment
choice
Flexible content and metadata management
Easy andautomated content
acquisition
Integrated secure access and discovery
Metadata storageAWS & Azure
Enhanced search
Advanced UABrowser-based upload
Preparation area
Auto archiving
APIs
v6.0 APIs
Active Digital Preservation
Preservica v6.0
OCR
Language support
Explorer v6.0
Catalog synchronization
Thank you
preservica.com
@preservica
@dPreservation
Active Preservation
Auto Preservation
Migration Pathways & Rendering
Data Model
Scalable Performance
Discovery & Access
Planned v6.2: features and user benefits
Innovation
Digital Preservation
Retention Management
Enhanced APIs
Multi-part Asset Processing and Rendering
Technology Refresh
Reference Metadata
Intelligent SharePoint Connector
Intelligent Archiving v6
Intelligent archiving and optimized ingest
Preparation Upload and Transfer
Access (UA) User Driven Enhancements
Auto-Preservation PoC
Setting, applying and executing retention and disposition Policy (singular) per Asset. Includes rudimentary legal hold
The APIs needed to create a great User Journey and experience
Adding management and rendering of multi-part assets that DMI enable - page turners, management of books, emails, CADCAM files
Keeping the enabling technology that Preservica is built upon up to date
Ability to keep basic referential / linked information about an asset
Retention Management
Pilots and roll out
Asynchronous harvesting from a v6 system
Rounding out Bulk Loading, adding multithreading
Enhanced, including scoping for validation and transfer client
User-led enhancements
Simplified Ability to manage Records (not just files)
Ability to manage Records (not just files)
Compelling and easy to use entry into Digital Preservation
Easier access to books, emails…
Higher functionality of the release, fewer bugs
Assets can be managed more easily…
ECM TCO reduction, safer preservation of ECM content
Federated and offline content, ESCROW, clear exit strategy (if ever wanted)
Higher performance harvesting and ingest
Simpler and quicker ingest, ingest package validating and editing.
Description User benefits
Allows you to focus on your role and specialist tasks
User Driven...