persistent identifiers for museum specimens, neic workshop, august 2015

Download Persistent identifiers for museum specimens, NeIC workshop, August 2015

If you can't read please download the document

Upload: dag-endresen

Post on 23-Jan-2018

497 views

Category:

Data & Analytics


0 download

TRANSCRIPT

  1. 1. PersistentIden+ers,NeICworkshopAugust2015inOslo DagEndresen,GBIFNorway,UiONaturalHistoryMuseum
  2. 2. Thepurposeofiden.ers istonamethings, makingitpossibletorefertothem. 2
  3. 3. Nameambiguity: Manythings(inGBIF)arenamed123 3 Catalognumber:123 GBIFID:543392241 urn:catalog:CAS:BOT:123 Bigelowiajuncea Catalognumber:123 GBIFID:1030591721 UAMb:Herb:123 Sphagnumgirgensohnii Catalognumber:123 GBIFID:893477175 Parideserithalion Catalognumber:123 GBIFID:1050327334 Cinchonaledgeriana Catalognumber:123 GBIFID:231564351 Umbrinacanariensis Catalognumber:123 GBIFID:931031820 Bromuskalmii Catalognumber:123 GBIFID:283363 urn:occurrence:Arctos:MVZ:Egg:123:164 Mercurialisovata Catalognumber:123 GBIFID:896547722 urn:occurrence:Arctos:MVZ:Egg:123:164 Contopussordidulusveliei
  4. 4. Whenistheiden.ergoodenough? Uniqueandpersistent-withinagivencontext. ThecommonexperienceisthatanidenEeriscreatedwithin asystemorwithinacontext,andthatatalaterdateitneeds tobeusedinanotherorlargercontext(KarenCoyle2006). Expandingcontext: 1. Withinonemuseumcollec+on(catalognumber). 2. Withinanetworkbetweenmuseumcollec+ons(collec+oncode+ cataloguenumber). 3. Withinbiodiversityinforma.onnetwork(ins+tu+oncode+ collec+on/datasetcode+cataloguenumber). 4. AttheInternet(e.g.hepURI,DOI,LSID,etc) 5. largercontextsarepossibletoimagineinthefuture!! 4
  5. 5. Expandingcontext 5 Internet Museum Iden+er
  6. 6. Iden.ersformuseumcollec.ons Thelongevityofmuseumsleadto: Theneedtouseiden3ersfromourpastinthecurrenthighly- networkeddigitalsystems(KarenCoyle2006[talkingaboutlibraries]). Specifyanamespacefortheiden+ers? URIuniformresourceiden+er(uniqueinthecontextoftheweb). URNuniformresourcename(namenot+edtoloca+on). URLuniformresourcelocator(networkloca+onasiden+er). PURLpersistentURL(commitmenttoservicelongevity). Somethingelse? DOIdigitalobjectiden+er ARKarchivalresourcekey UUIDuniversaluniqueiden+er 6
  7. 7. PersistentIden+er(PID) GloballyUniqueIden+er(GUID) UniversalResourceIden+er(URI) PersistentUniformResourceLocator(PURL) LifeScienceIden+er(LSID) DigitalObjectIden+er(DOI) Handlesystem(Handle) ArchivalResourceKey(ARK,EZID) UniversallyUniqueIden+er(UUID) 7
  8. 8. Photo:SmithsonianNa+onalMuseumofNaturalHistory,USNM-445024-Eutoxeres-aquila PURL Reuseexis3ngiden3ers 8
  9. 9. Globallyunique Scalability,numberofIDs Communityacceptance Long-termlife-cycle Resolvable,resolu+onservice(s) Costperiden+er People-friendlyormachine-friendly Solu+onforthegenera+onofnewIDs Centralgenera+on,PIDissuer Distributedgenera.onatsource 9
  10. 10. AUUIDisa16-octet(128-bit)36-charsnumber. Example:41d9cbb4-4590-4265-8079-ca44d46d27c3 Theprobabilityofoneduplicatewouldbeabout 50%ifeverypersononearthcreate600million UUIDs. Allowsforeasygenera.onatsourceina distributednetwork. 10
  11. 11. hepPURLUUID hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 11
  12. 12. Iden+er Resolver Loca+on Specimen Theresolverisasystemtoresolveloca+onsfromiden+ers, enablingretrievalevenwhentheloca+onchanges. hep://purl.org/nhmuio/id/[UUID] hep://gbif.no/resolver/[UUID] No-informaEonobject(hMpredirect) hMp303 redirect
  13. 13. hep://purl.org/nhmuio/id/UUIDhep://gbif.no/resolver/UUID hep://purl.org/gbifnorway/id/UUIDhep://gbif.no/resolver/UUID 13
  14. 14. Includingmachine readableformats 14
  15. 15. Catalognumber:O-L-000014hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 15
  16. 16. UUIDQRcodesformuseum objectsatNHM-UiOprovides: Machine-readableiden.ers (usingasimplesmartphone-ora barcodereader) Allowsfornewandecient workowsforcollec+on management. Deploymentforstableiden.ers appropriatefordata-basing. 16
  17. 17. hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3 (machinefriendly) Catalognumber:O-L-000014 (humanfriendly) Ecientworkowrou+nes
  18. 18. hep://gbif.no/transcribe/ 18
  19. 19. 19 Somekeychallengesforthegroupwork ManyoftheoriginalsourcedatasetsindexedbyGBIFareregularlyupdatedandre-indexedbytheGBIFportal.Without stableandpersistentiden+ersinforma+ononthesameherbariumspecimen(orspeciesobserva+on)aresome+mes includedmorethanone.me,leadingtoduplicatedinforma.on-duplicatedinthesenseofmorethanone(unlinked) datarecordforthesameRealWorlden+ty. Withoutstableandpersistentiden+ersforherbariumspecimens(andspeciesobserva+ons)itisdiculttolinkthe samedatarecordindexedatdierentre-indexingcyclesoftheGBIFportal.Whenadatarecordpreviouslyindexedisnot re-iden+edinanewversionofagivendataset,thentherecordisdeletedfromtheportal,andthelinktoprevious versionsofthisdatarecordislost. Acompositekeyiden.er(suchastheDarwinCoretriplet)basedonacombina.onthemetadataaIributesfor ins+tutecode(dwc:ins+tuteCode),collec+oncode(dwc:collec+onCode),andthelocalspecimeniden+er (dwc:catalogNumber)isgenerallyusedasthespecimeniden+erinGBIF.However,allthreemetadataaeributescan (anddo)some+meschange. Whatcouldbeabestprac+ceguidelineforiden.erresolu.on.Isitusefultodeneandagreeona(setof)common andwell-denedresponseformat?Isitusefultoproviderecommenda+onsforasetofmetadataproleswithaclear setofdenedmetadataaeributes?Orwouldmoregeneralprinciplesandmoreopenrecommenda+onsbemorelikely tostandthetestof+meandremainrelevantwiththeemergenceofnewinforma+oninfrastructuretechnologies? Challenges,prosandconsofreusingobjectiden.ersandmetadataaIributetermsdeclaredbyotherswithoutfull controlofhowtheseobjectsandtermsaremaintained.Objectsandconceptsdeclaredforapar+cularpurposewilloren notmatchexactlytheneedssuitableforanotherpurpose.Howtoop+mallyreuseeachothersOWLontologies, metadatavocabulariesanddataobjectmodels? Iden.ersiden.fyingtheRealWorldphysicalobjects,theen++esthatthecollec+oncuratorsandusersofthe informa+oncareabout.Orshouldtheiden+erbeassignedtodatabaserecords?RealWorlden++eswillnothavea signaturebyte-sequenceandwillrelyofinterpreta+onofwhenanobjectisconsideredtobethesamething.
  20. 20. [email protected] DagEndresen [email protected] Chris+anSvindseth [email protected] Gary Larson, 1987 20 Workshop in Oslo 26th Aug