persistent identifiers for digitized specimens

22
GBIF European Regional Nodes Meeting, 6 to 8 March, 2013, Joensuu, Finland Globally unique identifiers for digitized specimens Comparison of alternatives Dag Endresen GBIF Norway, NHM-UiO Natural History Museum, University of Oslo (NHM-UiO) Global Biodiversity Information Facility (GBIF) 6 March 2013

Upload: dag-endresen

Post on 01-Nov-2014

417 views

Category:

Technology


0 download

DESCRIPTION

Persistent identifiers (PID) for digitized museum collections. Presented at the European GBIF meeting at Digitarium in Jounsuu, Finland, 6 March 2013. A proposed model for assigning UUID PIDs using QR-codes during the imaging and digitization process.

TRANSCRIPT

Page 1: Persistent identifiers for digitized specimens

GBIF European Regional Nodes Meeting, 6 to 8 March, 2013, Joensuu, Finland

Globally unique identifiers for digitized specimensComparison of alternatives

Dag EndresenGBIF Norway, NHM-UiONatural History Museum, University of Oslo (NHM-UiO)Global Biodiversity Information Facility (GBIF)

6 March 2013

Page 2: Persistent identifiers for digitized specimens

Topics

• Darwin Core (DwC) & Identifiers• Persistent Identifiers• UUIDs• PID and the digitization workflow

2

Page 3: Persistent identifiers for digitized specimens
Page 4: Persistent identifiers for digitized specimens
Page 5: Persistent identifiers for digitized specimens
Page 6: Persistent identifiers for digitized specimens

Darwin Core – a vocabulary of terms

Wieczorek J, Bloom D, Guralnick R, Blum S, Döring M, De Giovanni R, Robertson T, and Vieglais D (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715. doi:10.1371/journal.pone.0029715

Page 7: Persistent identifiers for digitized specimens

Term name: occurrenceID

Identifier: http://rs.tdwg.org/dwc/terms/occurrenceID

Class: http://rs.tdwg.org/dwc/terms/Occurrence

Definition: An identifier for the Occurrence (as opposed to a particular digital record of the occurrence). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the occurrenceID globally unique.

Comment: For a specimen in the absence of a bona fide global unique identifier, for example, use the form: "urn:catalog:[institutionCode]:[collectionCode]:[catalogNumber]".

Examples: "urn:lsid:nhm.ku.edu:Herps:32", "urn:catalog:FMNH:Mammal:145732".

For discussion see http://code.google.com/p/darwincore/wiki/Occurrence

Page 8: Persistent identifiers for digitized specimens

Record-level Termsdcterms:type | dcterms:modified | dcterms:language | dcterms:rights | dcterms:rightsHolder | dcterms:accessRights | dcterms:bibliographicCitation | dcterms:references | institutionID | collectionID | datasetID | institutionCode | collectionCode | datasetName | ownerInstitutionCode | basisOfRecord | informationWithheld | dataGeneralizations | dynamicProperties

OccurrenceoccurrenceID | catalogNumber | occurrenceRemarks | recordNumber | recordedBy | individualID | individualCount | sex | lifeStage | reproductiveCondition | behavior | establishmentMeans | occurrenceStatus | preparations | disposition | otherCatalogNumbers | previousIdentifications | associatedMedia | associatedReferences | associatedOccurrences | associatedSequences | associatedTaxa

EventeventID | samplingProtocol | samplingEffort | eventDate | eventTime | startDayOfYear | endDayOfYear | year | month | day | verbatimEventDate | habitat | fieldNumber | fieldNotes | eventRemarks

dcterms:LocationlocationID | higherGeographyID | higherGeography | continent | waterBody | islandGroup | island | country | countryCode | stateProvince | county | municipality | locality | verbatimLocality | verbatimElevation | minimumElevationInMeters | maximumElevationInMeters | verbatimDepth | minimumDepthInMeters | maximumDepthInMeters | minimumDistanceAboveSurfaceInMeters | maximumDistanceAboveSurfaceInMeters | locationAccordingTo | locationRemarks | verbatimCoordinates | verbatimLatitude | verbatimLongitude | verbatimCoordinateSystem | verbatimSRS | decimalLatitude | decimalLongitude | geodeticDatum | coordinateUncertaintyInMeters | coordinatePrecision | pointRadiusSpatialFit | footprintWKT | footprintSRS | footprintSpatialFit | georeferencedBy | georeferencedDate | georeferenceProtocol | georeferenceSources | georeferenceVerificationStatus | georeferenceRemarks

GeologicalContextgeologicalContextID | earliestEonOrLowestEonothem | latestEonOrHighestEonothem | earliestEraOrLowestErathem | latestEraOrHighestErathem | earliestPeriodOrLowestSystem | latestPeriodOrHighestSystem | earliestEpochOrLowestSeries | latestEpochOrHighestSeries | earliestAgeOrLowestStage | latestAgeOrHighestStage | lowestBiostratigraphicZone | highestBiostratigraphicZone | lithostratigraphicTerms | group | formation | member | bed

IdentificationidentificationID | identifiedBy | dateIdentified | identificationReferences | identificationVerificationStatus | identificationRemarks | identificationQualifier | typeStatus

TaxontaxonID | scientificNameID | acceptedNameUsageID | parentNameUsageID | originalNameUsageID | nameAccordingToID | namePublishedInID | taxonConceptID | scientificName | acceptedNameUsage | parentNameUsage | originalNameUsage | nameAccordingTo | namePublishedIn | namePublishedInYear | higherClassification | kingdom | phylum | class | order | family | genus | subgenus | specificEpithet | infraspecificEpithet | taxonRank | verbatimTaxonRank | scientificNameAuthorship | vernacularName | nomenclaturalCode | taxonomicStatus | nomenclaturalStatus | taxonRemarks

ResourceRelationship (Auxiliary Terms)resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipAccordingTo | relationshipEstablishedDate | relationshipRemarks

MeasurementOrFact (Auxiliary Terms)measurementID | measurementType | measurementValue | measurementAccuracy | measurementUnit | measurementDeterminedDate | measurementDeterminedBy | measurementMethod | measurementRemarks

Page 9: Persistent identifiers for digitized specimens

Semantic MediaWiki

a forum for

discussion and development of

terminology.

http://terms.gbif.org/

9

Page 10: Persistent identifiers for digitized specimens

10

Page 11: Persistent identifiers for digitized specimens

• Persistent Identifier (PID)• Globally Unique Identifier (GUID)• Universal Resource Identifier (URI)• Persistent Uniform Resource Locator (PURL)• Life Science Identifier (LSID)• Digital Object Identifier (DOI)• Handle system (Handle)• Archival Resource Key (ARK)• Universally Unique Identifier (UUID)

11

Page 12: Persistent identifiers for digitized specimens

• Scalability, number of IDs• Community acceptance• Long-term life-cycle• Resolvable, resolution service(s)• Cost per identifier• People-friendly or machine-friendly• Generation of IDs

– Central generation, PID issuer – Distributed generation at source

12

Page 13: Persistent identifiers for digitized specimens

• A UUID is a 16-octet (128-bit) number.• Example: C37E3F9B-BCAF-4479-8EB7-

3346A2DB2373

• The probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.

• Allows for easy generation at source in a distributed network.

13

Page 14: Persistent identifiers for digitized specimens

• Quick Response Code (QR code).• A type of matrix barcode (or two-

dimensional code).• Popular due to its fast readability and large

storage capacity.• The use of QR Codes is free of any license.• The QR Code is clearly defined and

published as an ISO standard.• Invented in Japan by the Toyota subsidiary

Denso Wave in 1994.14

Page 15: Persistent identifiers for digitized specimens

QR code for all museum objects at NHM-UiO would provide:

•Machine-readable using an ordinary smart phone (or a barcode reader).

•New and efficient workflows for collection management.

•Deployment for stable identifiers appropriate for data-basing.

15

Page 16: Persistent identifiers for digitized specimens
Page 17: Persistent identifiers for digitized specimens
Page 18: Persistent identifiers for digitized specimens
Page 19: Persistent identifiers for digitized specimens
Page 20: Persistent identifiers for digitized specimens

dwc:datasetID DOI?

Page 21: Persistent identifiers for digitized specimens

Furthermore, I think that we need persistent identifiers!

Cato the Elder ended all his speeches in the senate of Rome with: "Ceterum

autem censeo Carthaginem esse delendam" (English: "Furthermore, I think Carthage must be destroyed").

21

Page 22: Persistent identifiers for digitized specimens

GBIF Norge

Dag [email protected]

Christian [email protected]

GBIF European Regional Nodes Meeting, 6 to 8 March, 2013.