may 23, 2007 archiving 2007 1 ace: a novel software platform to ensure the integrity of digital...

17
May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to ACE: A Novel Software Platform to Ensure the Integrity of Digital Ensure the Integrity of Digital Archives Archives Sangchul Song and Joseph JaJa Institute for Advanced Computer Science Studies Department of Electrical and Computer Engineering University of Maryland, College Park Sponsored by Library of Congress and NSF

Post on 21-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 1

ACE: A Novel Software Platform to ACE: A Novel Software Platform to Ensure the Integrity of Digital Ensure the Integrity of Digital

ArchivesArchives

Sangchul Song and Joseph JaJa

Institute for Advanced Computer Science Studies Department of Electrical and Computer Engineering

University of Maryland, College Park

Sponsored by Library of Congress and NSF

Page 2: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 2

Main Threats to Integrity of Digital Main Threats to Integrity of Digital ArchivesArchives

• Hardware/media degradation• Hardware/software malfunction• Operational errors• Technology evolution • Object transformation (format obsolescence)• Infrequent access to most data• Evolution of cryptographic schemes• Security breaches, malicious alterations

Page 3: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 3

Existing MethodologiesExisting Methodologies

• Core Techniques– Replication: mirroring– Coding techniques: parity checking (RAID),

erasure codes– Cryptographic one-way hashing: checksum

• Techniques for Digital Archives– Hashing only– Replication + voting scheme– Hashing + replication– Digital Signatures– Time Stamping (PKI vs. hash-linking)

Page 4: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 4

ACE - AssumptionsACE - Assumptions

• Basic Assumption on the archive– Each object has a persistent identifier– In the presence of multiple copies, one is

designated as master.

• No other assumptions – architecture can be centralized, distributed, or peer-to-peer; policies can be centralized, distributed, or federated.

Page 5: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 5

ACE – Base MethodologyACE – Base Methodology

• Three-tiered Cryptographic Information.

• Each tier is periodically audited separately according to policies set by managers.

IntegrityToken

WitnessCryptographicSummaryInformation

• 1 IT/object

• ~1KB

• 1 CSI/time window

• Or 1 CSI / (n) objects

• ~100MB/year

• 1 Witness/week

• ~2-3KB/year

k:1 l:1

Page 6: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 6

Three-Tiered Cryptographic Three-Tiered Cryptographic InformationInformation

Summary Information0

(CSI0)

Aggregated Hash Value0

(AHV0)

+ +

Summary Information1

(CSI1)

+

Summary Information2

(CSI2)

Aggregated Hash Value3

(AHV3)

h() h() h()

Aggregated Hash Value1

(AHV1)

Intermediate Hash Value (IHV)

Intermediate Hash Value(IHV)

Aggregation Time Frame for AHV1 time

Hash Value ofIT Req6

Hash Value ofIT Req7

Hash Value ofIT Req5

Hash Value ofIT Req8

Shaded values are proof to AHV1 for the integrity token issued for IT Req5

Witness Value0

(WV0)

Intermediate Hash Value(IHV)

Intermediate Hash Value (IHV)

Aggregation Time Frame for WV1

time

SI3

Striped values are proof to WV0 for SI0

Page 7: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 7

ACE – System ArchitectureACE – System Architecture

Token Registry

Archiving System Middleware

Arc

hivi

ng S

yste

mB

ridgi

ng C

ompo

nent

s be

twee

n A

rchi

ving

Sys

tem

and

AC

E In

tegr

ity

Man

agem

ent S

yste

m

AC

E In

tegr

ity M

anag

emen

t Sys

tem

Ace Audit Manager

Audit Trigger

Notifier

Audit Queues

Archiving Node

Token Registry

hdd

Archiving Node

cd-romtape drive

ACE Integrity Management System

Aggregator

LinkerSummary

InformationPost&Validator

cd-roms news groups

ACE Audit Manager

Audit Trigger

Notifier

Audit Queues

hdd cd-romtape drive

External WitnessStorages

Page 8: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 8

ACE – OverviewACE – Overview

Integrity Token

Hash (obj)

ACE-AM 3rd Party Auditor

Client ACE-IMS

object

Page 9: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 9

ACE – RegistrationACE – Registration

ACE Integrity Management System

Aggregator

RHVijl m

CSI Registry

Time Stamp CSI Time Frame IDProof

Web Services

Message Digest of TS_Req

Linker

CSIiCSIi-1RHVi

TSS_Stamp

TSS_ITRequest

TSS_CompareCSI

Post & Validator

Archivist

Deposit Processk

n

o

p

q

1. A request containingthe hash of the object is made to ACE.

2. When the aggregation round closes, the Aggregator builds an authentication tree.

3. A receipt is returned.4~5. A new

cryptographic summary is computed and the integrity token for each request is constructed.

6~8. Each object retrieves its integrity token.

Page 10: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 10

ACE Witness PublicationACE Witness Publication

CryptographicSummaryInformation

CryptographicSummaryInformation

Witness

Once a week, a witness is computedfrom the cryptographic summariesgenerated during the week.

The witness of the week is widely published on the Internet – currently, gets posted to the newsgroups at Google, Yahoo and MSN.

The witness is also stored on a CD-ROM

Page 11: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 11

ACE – DemoACE – Demo

Modify

Page 12: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 12

ACE AuditACE Audit

IntegrityToken

Witness

CryptographicSummaryInformation

Object

1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager.

2. The integrity management system periodically audits the integrity tokens according to its policies.

3. Cryptographic summaries are audited as necessary using the published witness values.

Page 13: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 13

Auditing Cryptographic SummariesAuditing Cryptographic Summaries

Witness

CryptographicSummaryInformation

CryptographicSummaryInformation

The system collects all the summaries that share the same Time Frame ID, and builds a validation witness.

The system retrieves the published witness of the Time Frame ID from the newsgroups. The published witness is then compared to the validation witness

Page 14: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 14

ACE Update– Obsolete Hash FunctionsACE Update– Obsolete Hash Functions

• Objects are registered again with the information on the old integrity token (IT).

• The new IT token is constructed using this information.

• The object integrity from the previous registration to the new registration can still be verified with the old IT, whereas the new IT will be responsible from the new registration.

Page 15: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 15

ACE Update – Object TransformationACE Update – Object Transformation

• The new object is registered again. However, the registration request contains information on the old integrity token.

• The new integrity token is constructed using

this information.

• With this information, a future audit can track current version back to the previous version.

Page 16: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 16

ACE PerformanceACE Performance

• Preliminary performance evaluation– Setup : Audits on the NARA EAP Image

Collection consisting over 1.1TB of 126,548 files.

– Results: All files were audited in about 15 hours.

– Note 1: Most of the time was spent in moving the data between the separate machines.

– Note 2: Registration on the same collection took almost the same time.

Page 17: May 23, 2007 Archiving 2007 1 ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced

May 23, 2007 Archiving 2007 17

ACE SummaryACE Summary

• Third-party auditable• Cryptographically rigorous yet cost-effective• Update-aware• Highly interoperable• Scalable• High Performance