digital object architecture: an advanced architecture for managing digital information

23
Digital Object Architecture: an Advanced Architecture for Managing Digital Information Presentation by Robert E. Kahn President & CEO Corporation for National Research Initiatives WSIS Forum 2011 May 19, 2011

Upload: tassos

Post on 25-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Digital Object Architecture: an Advanced Architecture for Managing Digital Information. WSIS Forum 2011 May 19, 2011. Presentation by Robert E. Kahn President & CEO Corporation for National Research Initiatives. Origins of the Internet. Multiple Different Packet Networks Open Architecture - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Presentation by

Robert E. Kahn

President & CEO

Corporation for National Research Initiatives

WSIS Forum 2011May 19, 2011

Page 2: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Origins of the Internet

• Multiple Different Packet Networks

• Open Architecture

• Implemented via the TCP/IP Protocols

• Standards Processes

• Sustained Research Support

• Eventually resulting in– Commercialization– Widespread Dissemination– Global Acceptance

Page 3: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Three Initial Networks

• DARPA originally funded three seminal packet networks – ARPANET, Packet Radio, Packet Satellite

• The Internet came about from a desire to enable users and their computers to communicate efficiently, independent of the network they were using

• Initial challenges were in areas such as:– Addressing– Routing– Congestion Control– Host Protocols

• Addressing (16 bits to the wire, 32 bit IPv4 addresses; later -- 128 bit IPv6 addresses, URLs)

Page 4: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Key Initial Decisions

• Global Addresses (IP) freed us from ARPANET addressing of the wires

• Gateways introduced for IP routing and for Network “Impedance Matching” – now called routers

• TCP dealt with network-related concerns– different packet sizes, duplicates, error

detection, losses due to tunnels, mountains, jamming, etc.

• Enabled separate network administration• Global information system based on an open

architecture

Page 5: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

From Packet Communication to Information Management

• The Internet did not start out with a primary goal of assisting users in managing information.

• Fast, efficient, reliable, global connectivity was the main goal– Information management was limited to ensuring proper

information flows in the Internet– The World Wide Web was an important step in simplifying

user access to information– Other alternatives are now emerging.

• We now present an open architecture approach to information management that– Makes use of existing Internet capabilities– allows different types of information management systems

to be developed and interoperate.

Page 6: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Digital Object Architecture

• To reformulate the Internet architecture to focus more specifically on managing information rather than just communicating bits

• Making use of its world-wide connectivity, but independent of current technology choices

• Enabling existing and new types of information to be reliably managed and accessed in the Internet environment, including over very long periods of time

• Providing mechanisms to stimulate dynamic new forms of expression and to manifest older forms

• Support for multi-lingual identifier names in most native/local scripts

• While supporting privacy, security, intellectual property protection, managed access and well-formed business practices

Page 7: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Digital Object Architecture

• Technical Components– Digital Objects (DOs)

• Structured data with a unique persistent identifier

– Resolution of the Unique Identifiers• To “state information” about the DOs

– Repositories• To deposit DOs• To access DOs with security

– Registries• To create and store metadata• For secure searching

Page 8: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Digital Object Architecture

Client

Resource Discovery•Metadata Registries in lieu of traditional

•Search Engines•Metadata Databases•Catalogues, Guides, etc.

Resolution System

Repositories / Collections

User

Page 9: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Selected Digital Object Types

• Documents, Books, Music, Videos, Spreadsheets• Personal data (coordinates, financial, medical)• Observational data (climate, radio astronomy)• Networking Information (operations, provisioning,

forecasting)• Commerce and Business Information (contracts, bills of

lading, letters of credit, etc)• Software (programs, running processes & distributed

systems)• Information about “Things”

Page 10: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Repositories

Any Hardware & SoftwareConfiguration

Logical External Interface

Store and Access Digital Objects on the Net

Digital Object Protocol

Page 11: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Digital Object Protocol

• Uniform interface for accessing repositories and their digital objects

• Based on the use of identifiers

• Provides authentication of both users and servers upon request or where required

• Uses identity management based on the use of public keys

• Key means of implementing interoperability

Page 12: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

The Digital Object Protocol is a Meta-Level, Extensible Interface

<input sequence><H1> <H2> <Params> <output sequence>

H1 is a handle for the operation applied to the Target DO H2.Similarly both A and B are known by their Handles HA and HB.The steps of the protocol are:

Establish a connection from A to B

{Optionally} A asks B to authenticate himself

If successful, A provides an input string to B

{Optionally} B asks A to authenticate herself

B provides the results of the operation

Either party may choose to continue or close

Page 13: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

• Registers the existence and access conditions for Digital Objects– Enables collections to be defined with appropriate access controls

• Provides a user interface to browse and search the registry, and an API for other programs to search the registry

• Integrates existing technologies– Handle System for identification and access– Digital Object Repository for metadata object storage and access– XML for object description and submission– Specification of Metadata Schemas

Metadata Registry

Page 14: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

ContentRepositories

CORDRACommunity

CORDRARegistry

CORDRARegistry

Community

ContentRepositories

CORDRACommunity

ContentRepositories

CORDRARegistry

IntermediateRegistry

of Registries

FederationLevel

Metadata

CORDRARegistry

CORDRACommunity

Federation Level

Metadata

FederationLevel

Metadata

CORDRACommunity

CORDRARegistry

ContentRepositories

CORDRACommunity

CORDRARegistry

ContentRepositories

IntermediateRegistry

of Registries

CORDRARegistry

Community

Federation LevelMetadata

MasterRegistry

of Registries

CORDRARegistry

Community

ContentRepositories

CORDRARegistry

Federation LevelMetadata

CORDRA

Page 15: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

What are Handles?Why Resolution Systems?

• CNRI uses the name “Handles” to denote digital object identifiers

• Others may prefer to use their own descriptors• Existing identifier schemes are accommodated• Identifiers provide a way to identify data structures

independent of their physical form or location, if any• Identifiers can be of many forms, and may contain

randomly generated strings, date-time stamps as well as semantics

• The identifier itself will not usually contain useful information about the digital object

• The resolution system is intended to make available the useful information

Page 16: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Why are identifiers Important

• For global addressing– and possibly routing

• For long-term information preservation• For building linkages

– In lieu of attachments– To create virtual structures

• For accessing related metadata– To convey search results– To authenticate/validate

• Connectivity• Individual Digital Objects• Identity

Page 17: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Structure of the Identifiers

• Digital Object Identifiers are structured as “prefix/suffix”

• They may be conveyed in various forms, such as:– 10.1234/Conf_Summary– HDL:10.1234/Conf_ Summary– hdl.handle.net/10.1234/Conf_Summary

• Each prefix has its own administrator with PKI access to the system for creation, change and deletion.

• Resolution of an identifier results in a returned resolution record – generally within a fraction of a second

Page 18: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Resolution Mechanism

Multiple WorkstationsDistributed Globally

Handle System<www.handle.net>

DO Identifier

ResolutionRecord

System is non –nodalScaleable & DistributedSupports global (and local) resolution

Page 19: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Handle System Features

• Supports both Resolution and Administration• Internationalized character sets• Secured resolution service• Provides for Unique Persistent Identifiers

• Current Users include:

DOI System, Open Archives Initiative, Library of

Congress, CNNIC, Office of European Publications,

DataCite, EIDR, DSpace Community and others

Page 20: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Handle Resolution

is a collection ofhandle services,each of which consists of one ormore replicated sites,

Site 1Site 1 Site 2Site 2

Site 1Site 1

Site 2Site 2

Site 3Site 3 …... Site nSite n

Client

The Handle System

LHS

LHS LHS

LHSGHR

each of which mayhave one or moreservers.

123.456/abc URL 4 http://www.acme.com/

http://www.ideal.com/8URL

#1#1 #2#2 #n#n#4#4#3#3

#1#1 #2#2

...

Page 21: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Mirroring the Global Handle Registry

M M P M M • • • •• • • •

Administration

user user user

Non-System Handle Recordsare in lots of Local Handle Services

Contains SystemHandle Records

Page 22: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Planned Deployment of aMulti-Primary Global Registry

P P P P P • • • •• • • •

A limited number of primarieseach Administered Separately

user user user

Non-System Handle Recordsare in lots of Local Handle Services

Contains SystemHandle Records

Plus MirrorsPlus Mirrors

Page 23: Digital Object Architecture: an Advanced Architecture for Managing Digital Information

Observations

• Identifiers provide the glue that holds complex distributed systems together

• Security can be provided at a very fine level of granularity in the system

• Repositories enable reliable long-term access to digital objects over generations of technology change

• Registries enable digital objects to be made known and findable using multiple metadata schemas

• The Multi-primary Global Registry enables distributed administration on a collaborative basis by multiple parties around the world.

• Finally, DONA will provide a framework for the management of the DO Architecture in the future.