an overview of persistent identifiers
DESCRIPTION
An Overview of Persistent Identifiers. George M. Garrity Microbiology and Molecular Genetics Michigan State University. The phone call from Rome…. To provide the Stakeholders with an overview of persistent identifiers and digital objects Explore both the technical and social/policy issues - PowerPoint PPT PresentationTRANSCRIPT
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
An Overview of Persistent Identifiers
George M. GarrityMicrobiology and Molecular Genetics
Michigan State University
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
The phone call from Rome…To provide the Stakeholders with an overview
of persistent identifiers and digital objectsExplore both the technical and social/policy
issuesProvide some perspective on how persistent
identifiers have been applied in two settingsMature application - CrossRefEvolving application - NamesforLife
Offer some thoughts on how PIDs might be applied to implementing standard material transfer agreements
My assignment
Disclaimers An end-user of persistent identifiersDual interests and IP in this space
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
So, what’s the problem?
“…link heterogeneous electronic libraries.The difficulties inherent in this third
objective ultimately led to this paper. ”
“But for the bioinformatician concerned with integrating and computing upon distributed information… In second place is perhaps naming (identifying), with all the gloriously idiosyncratic embedded semantics of local identifiers in disparate forms.”
Kahn and Wilensky1993
Clark 2003
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
So, what’s the problem?
“Even well-formed and properly applied names can serve as a source of confusion and considerable frustration. This is hardly a new problem.”
Garrity and Lyons2003
“Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.”
Report of the NISOIdentifiers Round-Table 2006
“And now, a much more succinct way to say this: our systems are autistic. They don’t make inferences. When we learn something in one system or one area, it doesn’t carry over to other areas.”
McComb 2006
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Let’s start with some working definitionsAn instance of an abstract data type that has two components: metadata and key metadataKey metadata includes a handleA handle is a globally unique identifier that is bound to the digital objectDigital objects
differ from database records and files,are stored in network accessible repositories,and are accessed using a repository access protocol.
Other key properties
Digital objects
From: Kahn and Wilenski 2006 Int J. Digit. Lib 6: 115-223
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Essential elements inHuman - machine communicationsMachine - machine communications
Identifiers
Ideally… Exist as an unambiguous stringContext and application dependent
ActionableResolvable
Other points to considerSemantically opaqueGlobal or localUnique or non-uniqueUnanticipated uses
Some more definitions
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
and some more definitions…A name or an identifier for a resource that
uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.
Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.
PersistentIdentifiers
From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
…and yet another definition.Name resolution The process of mapping a persistent identifier to
a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).
PID URLPID1
PID2
PID3
URL1
URL2
URL3
Resource
Identifies LocatesName resolution
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Inherent in the design of such systems….
Name registration&
Name resolution
Authority
PID URLPID1
PID2
PID3
URL1
URL2
URL3
ResourceMetadata
PID URL
Identifies Locates
User
Key metadata
Global registry
DOIdirectory
URLURL
URL
URL
URL
URL
URLURL
URL
URL
URL
URL
URL
URL
Content
Content
Assigner
DOIdirectory
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
Source: Norman Paskin, International DOI Foundation
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Comparing identifiersA single unambiguous string
A numbering scheme
A label that identifies an entityISBN 0-387-98771-1ATCC 27126*L-681,572-001
A method of providing consistent syntax to denote class membership of an entity.
A formal standard or industry conventionAn arbitrary internal systemKey point is establishing a 1:1 correspondence
between labels and membersEnumerationThe number or label are simply strings
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Comparing identifiers (cont.)A syntax by which an identifier can be expressed in a
form suitable for use within a specific infrastructure.
Actionable identifiersURI (URN and URL)ISBN numbers as UPC/EAN identifiers
Does not mandate a method of creating labelsDoes not create a managed environment
An infrastructure specification
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Includes Unique identifiersA formalized infrastructureManagement policies for registration,
structured interoperable metadata, policy, and governance mechanisms.
ExamplesUPC/EAN barcodes and RFID tagsDigital object identifiers (digital identifiers
of objects)
A fully implementedidentifier system
Comparing identifiers (cont.)
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Desired properties of a candidate PID
Semantically opaque - avoid the pitfalls of embedded meaningGovernance - is there a technical and social framework
overseeing the development, implementation and “marketing’ of the PID?
Persistence - is there a mechanism in place to guarantee persistence of issued PIDs, when so desired?
Registration - is there a mechanism for global registration of the PIDs or can anyone issue PIDs?
Metadata - is there a minimal requirement for metadata associated with each identified object?
Accepted standard - is there evidence that the PID is an accepted standard?
Globally unique - are the PIDs globally unique? Widespread usage - how many PIDs have been issued and
what is the rate of issuance of new PIDs?
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Desired properties of a candidate PID (cont)
Object/location - what does the PID identify?Actionable - are network services attached/imbedded?Unique - does the resolution service check for uniqueness at
the local level?Interoperability - can the identifiers be readily incorporated
into other applications without modification or permission?Granularity - can the identifiers be assigned to
subcomponents (nesting of entities within entities).Business model - is there a compelling business need for the
PIDs to insure that the infrastructure can be maintained in a self-supporting manner?
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Comparison of identifier properties
OpaqueGovernancePersistentRegistrationMetadataAccepted standardGlobalWidespread useObjectActionableUniqueInteroperableAccession numbers - - V - V - - + + - - -LSID - - ? - V ? V ? - + + ?Gene names V - - - - + - + + - - -PURL - - - - + ? - - + + + +Taxid + - - - + - - ? + V + ?DNS - + - + - + + + - + + +Taxonomic names - + + v - + + + + - - -Handle + - + + + - + ? + + + +DOI + + + + + + + + + + + +
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
What does a Digital Object Identifier look like?
The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly.
The suffix is an opaque string supplied by the content provider.Handle software stores a mapping of the Handle to one or more
locations (or services) In virtually all cases today, the Handle is mapped to a location (URL).
http://dx.doi.org/10.1007/bergeysoutlineresolves to
http://141.150.157.80/bergeysoutline/main.htmWhich used to be:
http://www.springer-ny.com/bergeysoutline
10.1234/myownnumbers-123.00001
prefix suffix subsuffix
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Syntax of some other PIDs in “common” use
<Handle>::=<Handle Prefix> "/"<Handle Suffix>http://hdl.handle.net/10.1099/ijs.0.64483-0
PersistentURLs
LSID Life ScienceIdentifiers
<purl>::=<protocol>/<resolver>/<name>http://purl.oclc.org/OCLC/OCLC/PURL/FAQ
urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev>http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Two implementations using DOIsIndependent membership association,founded and
directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls.
The largest and most successful implementation of DOI services.
NamesforLife is a proprietary semantic resolution service developed at MSU. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related goods and services.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
The knowledge gradient
Unkno
wnun
know
ns
Know
n kn
owns
Basic and applied research advances
knowledge
Knowledge bleed results is a loss of
knowledge that has already been gained
Semantic resolution provides a mechanism to combat knowledge
bleed
Unkno
wnkn
owns
Know
n un
know
ns
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Ramifications of misunderstanding a name
Wrong assumptions, assertions, or hypotheses Misdiagnosis of infectious diseasesMisapplication of public policies
Highly significant
Significant Lost opportunitiesFailure to reach potential customers potentially interested in
marketed content, goods, and services at point of need.The long-tail phenomenon*
Names trigger specificresponses
But, the concepts to which names apply are not staticMay not always map 1:1May require expertise for accurate interpretation
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. IT Support for SMTA ImplementationFAO - Rome, 14-Feb-2007
Some thoughts on selecting a PID for SMTAs
The intended use of the identifierSyntactic rules governing the form of the identifierWhat the identifier resolves toThe technical infrastructure that is available to support the identifier
and the parties operating that infrastructurePolicies governing creation, maintenance, support, and persistence
of the identifierInformation about any metadata related to the identifier that is or
must be made availableA history about the identifier, including any changes in any of the
above points over time.
Source: Report of the NISO Identifiers Roundtable 2006
Questions?