persistent identifiers for digitized specimens

Post on 01-Nov-2014

417 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Persistent identifiers (PID) for digitized museum collections. Presented at the European GBIF meeting at Digitarium in Jounsuu, Finland, 6 March 2013. A proposed model for assigning UUID PIDs using QR-codes during the imaging and digitization process.

TRANSCRIPT

GBIF European Regional Nodes Meeting, 6 to 8 March, 2013, Joensuu, Finland

Globally unique identifiers for digitized specimensComparison of alternatives

Dag EndresenGBIF Norway, NHM-UiONatural History Museum, University of Oslo (NHM-UiO)Global Biodiversity Information Facility (GBIF)

6 March 2013

Topics

• Darwin Core (DwC) & Identifiers• Persistent Identifiers• UUIDs• PID and the digitization workflow

2

Darwin Core – a vocabulary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Term name: occurrenceID

Identifier: http://rs.tdwg.org/dwc/terms/occurrenceID

Class: http://rs.tdwg.org/dwc/terms/Occurrence

Definition: An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.

Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]".

Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732".

For discussion see http://code.google.com/p/darwincore/wiki/Occurrence

Record-level Termsdcterms:type | dcterms:modified | dcterms:language | dcterms:rights | dcterms:rightsHolder | dcterms:accessRights | dcterms:bibliographicCitation | dcterms:references | institutionID | collectionID | datasetID | institutionCode | collectionCode | datasetName | ownerInstitutionCode | basisOfRecord | informationWithheld | dataGeneralizations | dynamicProperties

OccurrenceoccurrenceID | catalogNumber | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproductiveCondition | behavior | establishmentMeans | occurrenceStatus | preparations | disposition | otherCatalogNumbers | previousIdentifications | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxa

EventeventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verbatimEventDate | habitat | fieldNumber | fieldNotes | eventRemarks

dcterms:LocationlocationID | higherGeographyID | higherGeography | continent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verbatimLocality | verbatimElevation | minimumElevationInMeters | maximumElevationInMeters | verbatimDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | locationAccordingTo | locationRemarks | verbatimCoordinates | verbatimLatitude | verbatimLongitude | verbatimCoordinateSystem | verbatimSRS | decimalLatitude | decimalLongitude | geodeticDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpatialFit | footprintWKT | footprintSRS | footprintSpatialFit | georeferencedBy | georeferencedDate | georeferenceProtocol | georeferenceSources | georeferenceVerificationStatus | georeferenceRemarks

GeologicalContextgeologicalContextID | earliestEonOrLowestEonothem | latestEonOrHighestEonothem | earliestEraOrLowestErathem | latestEraOrHighestErathem | earliestPeriodOrLowestSystem | latestPeriodOrHighestSystem | earliestEpochOrLowestSeries | latestEpochOrHighestSeries | earliestAgeOrLowestStage | latestAgeOrHighestStage | lowestBiostratigraphicZone | highestBiostratigraphicZone | lithostratigraphicTerms | group | formation | member | bed

IdentificationidentificationID | identifiedBy | dateIdentified | identificationReferences | identificationVerificationStatus | identificationRemarks | identificationQualifier | typeStatus

TaxontaxonID | scientificNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scientificName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | namePublishedInYear | higherClassification | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verbatimTaxonRank | scientificNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarks

ResourceRelationship (Auxiliary Terms)resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipAccordingTo | relationshipEstablishedDate | relationshipRemarks

MeasurementOrFact (Auxiliary Terms)measurementID | measurementType | measurementValue | measurementAccuracy | measurementUnit | measurementDeterminedDate | measurementDeterminedBy | measurementMethod | measurementRemarks

Semantic MediaWiki

a forum for

discussion and development of

terminology.

http://terms.gbif.org/

9

10

• Persistent Identifier (PID)• Globally Unique Identifier (GUID)• Universal Resource Identifier (URI)• Persistent Uniform Resource Locator (PURL)• Life Science Identifier (LSID)• Digital Object Identifier (DOI)• Handle system (Handle)• Archival Resource Key (ARK)• Universally Unique Identifier (UUID)

11

• Scalability, number of IDs• Community acceptance• Long-term life-cycle• Resolvable, resolution service(s)• Cost per identifier• People-friendly or machine-friendly• Generation of IDs

– Central generation, PID issuer – Distributed generation at source

12

• A UUID is a 16-octet (128-bit) number.• Example: C37E3F9B-BCAF-4479-8EB7-

3346A2DB2373

• The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.

• Allows for easy generation at source in a distributed network.

13

• Quick Response Code (QR code).• A type of matrix barcode (or two-

dimensional code).• Popular due to its fast readability and large

storage capacity.• The use of QR Codes is free of any license.• The QR Code is clearly defined and

published as an ISO standard.• Invented in Japan by the Toyota subsidiary

Denso Wave in 1994.14

QR code for all museum objects at NHM-UiO would provide:

•Machine-readable using an ordinary smart phone (or a barcode reader).

•New and efficient workflows for collection management.

•Deployment for stable identifiers appropriate for data-basing.

15

dwc:datasetID DOI?

Furthermore, I think that we need persistent identifiers!

Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum

autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").

21

GBIF Norge

Dag Endresendag.endresen@nhm.uio.no

Christian Svindsethchristian.svindseth@nhm.uio.no

GBIF European Regional Nodes Meeting, 6 to 8 March, 2013.

top related