an overview of persistent identifiers

24
QuickTi TIFF (Unco are neede IT Support for SMTA Implementation FAO - Rome, 14-Feb-2007 An Overview of Persistent Identifiers George M. Garrity Microbiology and Molecular Genetics Michigan State University

Upload: irish

Post on 18-Mar-2016

50 views

Category:

Documents


1 download

DESCRIPTION

An Overview of Persistent Identifiers. George M. Garrity Microbiology and Molecular Genetics Michigan State University. The phone call from Rome…. To provide the Stakeholders with an overview of persistent identifiers and digital objects Explore both the technical and social/policy issues - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

An Overview of Persistent Identifiers

George M. GarrityMicrobiology and Molecular Genetics

Michigan State University

Page 2: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

The phone call from Rome…To provide the Stakeholders with an overview

of persistent identifiers and digital objectsExplore both the technical and social/policy

issuesProvide some perspective on how persistent

identifiers have been applied in two settingsMature application - CrossRefEvolving application - NamesforLife

Offer some thoughts on how PIDs might be applied to implementing standard material transfer agreements

My assignment

Disclaimers An end-user of persistent identifiersDual interests and IP in this space

Page 3: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

So, what’s the problem?

“…link heterogeneous electronic libraries.The difficulties inherent in this third

objective ultimately led to this paper. ”

“But for the bioinformatician concerned with integrating and computing upon distributed information… In second place is perhaps naming (identifying), with all the gloriously idiosyncratic embedded semantics of local identifiers in disparate forms.”

Kahn and Wilensky1993

Clark 2003

Page 4: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

So, what’s the problem?

“Even well-formed and properly applied names can serve as a source of confusion and considerable frustration. This is hardly a new problem.”

Garrity and Lyons2003

“Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.”

Report of the NISOIdentifiers Round-Table 2006

“And now, a much more succinct way to say this: our systems are autistic. They don’t make inferences. When we learn something in one system or one area, it doesn’t carry over to other areas.”

McComb 2006

Page 5: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Let’s start with some working definitionsAn instance of an abstract data type that has two components: metadata and key metadataKey metadata includes a handleA handle is a globally unique identifier that is bound to the digital objectDigital objects

differ from database records and files,are stored in network accessible repositories,and are accessed using a repository access protocol.

Other key properties

Digital objects

From: Kahn and Wilenski 2006 Int J. Digit. Lib 6: 115-223

Page 6: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Essential elements inHuman - machine communicationsMachine - machine communications

Identifiers

Ideally… Exist as an unambiguous stringContext and application dependent

ActionableResolvable

Other points to considerSemantically opaqueGlobal or localUnique or non-uniqueUnanticipated uses

Some more definitions

Page 7: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

and some more definitions…A name or an identifier for a resource that

uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.

Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.

PersistentIdentifiers

From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001

Page 8: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

…and yet another definition.Name resolution The process of mapping a persistent identifier to

a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).

PID URLPID1

PID2

PID3

URL1

URL2

URL3

Resource

Identifies LocatesName resolution

Page 9: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Inherent in the design of such systems….

Name registration&

Name resolution

Authority

PID URLPID1

PID2

PID3

URL1

URL2

URL3

ResourceMetadata

PID URL

Identifies Locates

User

Key metadata

Global registry

Page 10: An Overview of Persistent Identifiers

DOIdirectory

URLURL

URL

URL

URL

URL

URLURL

URL

URL

URL

URL

URL

URL

Content

Content

Assigner

DOIdirectory

DOIdirectory

DOIDOI

DOI

DOI

DOI

DOI

DOIDOI

DOI

DOI

DOI

DOI

DOI

DOI

doi>doi>doi>

Source: Norman Paskin, International DOI Foundation

Page 11: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Comparing identifiersA single unambiguous string

A numbering scheme

A label that identifies an entityISBN 0-387-98771-1ATCC 27126*L-681,572-001

A method of providing consistent syntax to denote class membership of an entity.

A formal standard or industry conventionAn arbitrary internal systemKey point is establishing a 1:1 correspondence

between labels and membersEnumerationThe number or label are simply strings

Page 12: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Comparing identifiers (cont.)A syntax by which an identifier can be expressed in a

form suitable for use within a specific infrastructure.

Actionable identifiersURI (URN and URL)ISBN numbers as UPC/EAN identifiers

Does not mandate a method of creating labelsDoes not create a managed environment

An infrastructure specification

Page 13: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Includes Unique identifiersA formalized infrastructureManagement policies for registration,

structured interoperable metadata, policy, and governance mechanisms.

ExamplesUPC/EAN barcodes and RFID tagsDigital object identifiers (digital identifiers

of objects)

A fully implementedidentifier system

Comparing identifiers (cont.)

Page 14: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Desired properties of a candidate PID

Semantically opaque - avoid the pitfalls of embedded meaningGovernance - is there a technical and social framework

overseeing the development, implementation and “marketing’ of the PID?

Persistence - is there a mechanism in place to guarantee persistence of issued PIDs, when so desired?

Registration - is there a mechanism for global registration of the PIDs or can anyone issue PIDs?

Metadata - is there a minimal requirement for metadata associated with each identified object?

Accepted standard - is there evidence that the PID is an accepted standard?

Globally unique - are the PIDs globally unique? Widespread usage - how many PIDs have been issued and

what is the rate of issuance of new PIDs?

Page 15: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Desired properties of a candidate PID (cont)

Object/location - what does the PID identify?Actionable - are network services attached/imbedded?Unique - does the resolution service check for uniqueness at

the local level?Interoperability - can the identifiers be readily incorporated

into other applications without modification or permission?Granularity - can the identifiers be assigned to

subcomponents (nesting of entities within entities).Business model - is there a compelling business need for the

PIDs to insure that the infrastructure can be maintained in a self-supporting manner?

Page 16: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Comparison of identifier properties

OpaqueGovernancePersistentRegistrationMetadataAccepted standardGlobalWidespread useObjectActionableUniqueInteroperableAccession numbers - - V - V - - + + - - -LSID - - ? - V ? V ? - + + ?Gene names V - - - - + - + + - - -PURL - - - - + ? - - + + + +Taxid + - - - + - - ? + V + ?DNS - + - + - + + + - + + +Taxonomic names - + + v - + + + + - - -Handle + - + + + - + ? + + + +DOI + + + + + + + + + + + +

Page 17: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

What does a Digital Object Identifier look like?

The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly.

The suffix is an opaque string supplied by the content provider.Handle software stores a mapping of the Handle to one or more

locations (or services) In virtually all cases today, the Handle is mapped to a location (URL).

http://dx.doi.org/10.1007/bergeysoutlineresolves to

http://141.150.157.80/bergeysoutline/main.htmWhich used to be:

http://www.springer-ny.com/bergeysoutline

10.1234/myownnumbers-123.00001

prefix suffix subsuffix

Page 18: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Syntax of some other PIDs in “common” use

<Handle>::=<Handle Prefix> "/"<Handle Suffix>http://hdl.handle.net/10.1099/ijs.0.64483-0

PersistentURLs

LSID Life ScienceIdentifiers

<purl>::=<protocol>/<resolver>/<name>http://purl.oclc.org/OCLC/OCLC/PURL/FAQ

urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev>http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2

Page 19: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Two implementations using DOIsIndependent membership association,founded and

directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls.

The largest and most successful implementation of DOI services.

NamesforLife is a proprietary semantic resolution service developed at MSU. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related goods and services.

Page 20: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

The knowledge gradient

Unkno

wnun

know

ns

Know

n kn

owns

Basic and applied research advances

knowledge

Knowledge bleed results is a loss of

knowledge that has already been gained

Semantic resolution provides a mechanism to combat knowledge

bleed

Unkno

wnkn

owns

Know

n un

know

ns

Page 21: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Page 22: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Page 23: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Ramifications of misunderstanding a name

Wrong assumptions, assertions, or hypotheses Misdiagnosis of infectious diseasesMisapplication of public policies

Highly significant

Significant Lost opportunitiesFailure to reach potential customers potentially interested in

marketed content, goods, and services at point of need.The long-tail phenomenon*

Names trigger specificresponses

But, the concepts to which names apply are not staticMay not always map 1:1May require expertise for accurate interpretation

Page 24: An Overview of Persistent Identifiers

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007

Some thoughts on selecting a PID for SMTAs

The intended use of the identifierSyntactic rules governing the form of the identifierWhat the identifier resolves toThe technical infrastructure that is available to support the identifier

and the parties operating that infrastructurePolicies governing creation, maintenance, support, and persistence

of the identifierInformation about any metadata related to the identifier that is or

must be made availableA history about the identifier, including any changes in any of the

above points over time.

Source: Report of the NISO Identifiers Roundtable 2006

Questions?