lifecycle metadata for digital objects

17
Metadata for Digital Objects October 18, 2004 Transfer / Authenticity Metadata

Upload: arnav

Post on 06-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Lifecycle Metadata for Digital Objects. October 18, 2004 Transfer / Authenticity Metadata. Review of metadata seen. Creation metadata Appraisal, records management, scheduling Transfer / authenticity not really covered except in terms of the ingest process. Transferring paper records I. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lifecycle Metadata for Digital Objects

Lifecycle Metadata for Digital Objects

October 18, 2004

Transfer / Authenticity Metadata

Page 2: Lifecycle Metadata for Digital Objects

Review of metadata seen

Creation metadata Appraisal, records management,

scheduling Transfer / authenticity not really covered

except in terms of the ingest process

Page 3: Lifecycle Metadata for Digital Objects

Transferring paper records I Metaphor for electronic process Metadata generated throughout Records Center Storage Approval Form

– Agency approval signature– Description of materials

Initial steps are significant for:– Setting up for secure transfer– Defining required metadata to make sense of

records in storage Approval Number received for transmission

– This step embeds schedule metadata

Page 4: Lifecycle Metadata for Digital Objects

Transferring paper records II This stage defines formatting for:

– Wrapper– Materials inside

Pack and label correctly (agreed standard)– Use proper boxes– Label with identifiers (RM descriptors)– Pack in original order and approved arrangement– Number boxes in batch– Stack correctly

Transmittal Form for batch– “Digest” of contents (this step a “handshake”)– Generates metadata for the transfer itself

Access Codes received for boxes

Page 5: Lifecycle Metadata for Digital Objects

The central problem: Security guaranteeing Authenticity Guarding the object (authenticity,

integrity) Proving the identities of the people

responsible for transferring the object (authentication, non-repudiation)

Transferring the object in a secure way

Page 6: Lifecycle Metadata for Digital Objects

Completeness and the moment of “recordness” Assertion that the object is complete (cf. UBC) Assertion that it is an archivable object Assertion that the asserter has the authority to

create the record or archive it All these assertions may be system-supplied

in the digital environment:– user logins– user role ID– identity of the workstation on the network– Creator’s action in performing a save

Page 7: Lifecycle Metadata for Digital Objects

What is transfer about? First: it is a COPY What is a digital copy? What qualifies?

– Data compression issues– Data segmentation issues– Creating application vs file-management application

How can a digital copy be guaranteed accurate? Compare with original– Digital object as string of bits– Message digest of object as math on the bits– Ship the message digest with the object– Recalculate and compare at the other end

Page 8: Lifecycle Metadata for Digital Objects

Moving from user to repository Using the public network securely Sending from user to repository

– Virtual Private Network (VPN)– Secure Sockets Layer (SSL)

“Secure drop-box” technology– Separate “hardened” server (between “DMZ”s)– Only A can deposit, only B can withdraw

Repository harvests objects from user’s drop-box

Page 9: Lifecycle Metadata for Digital Objects

Proving the identity of the sender (Authentication I: Identity) Assymetrical encryption

– Private/public keys: reverse purposes• Private = used by one juridical person• Public = used by many persons

Digital signature– Calculate message digest– Use one of asymmetric key pair to transform

• If recipient’s public key, only recipient can decode (using own private key)

• If sender’s private key, only sender can have sent (proved by sender’s public key)

– Use second of assymetric key pair to decrypt– Check message digest against message

Page 10: Lifecycle Metadata for Digital Objects

Proving the identity of the sender (Authentication II: Non-repudiation)

Certification (PKI, “XKI”)– Connecting keys with juridical persons: third party

certificators– External or internal (PKI can be managed for

internal business, e.g. a state)– Endurance over time: What does CA say?

System permissions and activity– Data collected from system/network operations logs– Necessity for collecting as archival!

Page 11: Lifecycle Metadata for Digital Objects

Authenticity of the object (Authentication III: Integrity) Object as open or secret: two issues

– Must we disguise/encrypt the object?– Can we move it around in clear?

(Cryptographic) Message Digest (MD5)– Creates single 32-digit number: “one-way hash”– Number will change with the slightest change in the

object on which it was calculated– Insecure for encryption

Encryption (Confidentiality)– Asymmetric (now dominant)– Symmetric (issues of exchanging keys)

Page 12: Lifecycle Metadata for Digital Objects

Proving the identity of the receiver How is this done in the paper/physical case?

– Locations– Signatures– Other signs and proofs

How done in the digital case?– Digital signature– System permissions– Recorded as part of repository operations records

Page 13: Lifecycle Metadata for Digital Objects

Documenting the actual transfer

Time-stamps on the copy System logs of the underlying

transmitting and receiving systems– Desktop Windows systems have system

logs but they are still fairly primitive– Server logs can be exremely elaborate– Repository/digital library logs can be

designed to any requirement

Page 14: Lifecycle Metadata for Digital Objects

Verifying the transfer

Quality control: compare with paper process

Verifying the message digest Checking the object against the wrapper

– Use metadata to make sure you have all of what was sent and in the proper format

– This is the most fundamental process carried out during ingest

Page 15: Lifecycle Metadata for Digital Objects

XML and digital signatures

XML wrapper for a set of objects permits individual or multiple objects to be signed: “subtree signing”– Objects can potentially be signed by different

people in workflow– Thus a born-digital XML-wrapped object may

already contain several digital signatures from different sources

May require verification and resigning as a single object by record-asserting entity before transfer

Page 16: Lifecycle Metadata for Digital Objects

XML Signature<Signature>

<SignedInfo><CanonicalizationMethod Algorithm=“URI”/><SignatureMethod Algorithm=“URI”/><Reference URI=“URI”/>

<Transforms><Transform Algorithm=“URI”></Transforms><DigestMethod Algorithm=“URI”/><DigestValue>32-bit value here</DigestValue>

</Reference>

</SignedInfo><SignatureValue>32-bit value here</SignatureValue><KeyInfo>info about key here</KeyInfo>

</Signature>

Page 17: Lifecycle Metadata for Digital Objects

What is canonicalization? Two XML documents may differ in their entity

structure, attribute ordering, and character encoding, because the standard doesn’t care

But a valid XML document has a precise logical structure related to its DTD or schema, no matter how it looks or what order things are in

Canonicalization means processing the XML file to a single standard form (as defined by W3C): see http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Intro

What does this mean for “authenticity”?