globally unique identifiers in biodiversity informatics kevin richards landcare research nz tdwg...

20
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Upload: melvin-robbins

Post on 20-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Globally Unique Identifiers in

Biodiversity Informatics

Kevin RichardsLandcare Research NZ

TDWG 2008

Page 2: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Introduction

GUID (Globally Unique IDentifier)

– What, Why, Which, How– LSIDs– Issues

Page 3: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

What are GUIDs

Globally Unique IDentifier• A short name for a complex entity on the web• Each name identifies only one entity• Examples:

– UUID eg 3E9D6B68-A08C-4F15-BC8A-1265F15D30E2

– DOI eg doi:10.1006/jmbi.1998.2354 – Handle eg hdl:123.456/abc

– LSID eg urn:lsid:indexfungorum.org:names:213645

– PURL eg http://purl.oclc.org/abc/123

Page 4: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

What is a GUID

– Properties• Persistent• Opaque • Resolvable, sometimes - useful for locating

information about the entity

Page 5: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Why use GUIDs

Data at Provider 2

BOOK : “Three little pigs” 2 copies

Data Consumer

Data at Provider 1

BOOK : “The three little pigs” 3 copies

BOOKS:“Three little pigs” … (2)“The three little pigs” … (3)

Page 6: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Data at Provider 2 (ID = P2)

BOOK : “Three little pigs”ID (eg ISBN) = A123 2 copies

Data Consumer

Data at Provider 1 (ID = P1)

BOOK : “The three little pigs”ID (eg ISBN) = A1233 copies

BOOKS:ID : A123 : “The three little pigs”… (5)

… but with GUIDs …

BOOK Titles:ID A123 : Provider P1 : “The three little pigs”ID A123 : Provider P2 : “Three little pigs”

Page 7: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Example in our domain

ConsensusId : urn:lsid:compositae.org:names:45240C9B-D419-4B6F-93A5-D0A6DEAB4C81Name : Anthemis gaudium-solis Velen.

Provider Id Taxon Name

IPNI urn:lsid:ipni.org:names:177325-1:1.1 Anthemis gaudium-solis Vel.

Tropicos 50163035 Anthemis goudium-solis Velen.

Euro+Med 133202 Anthemis gaudium-solis Velen.

Govaerts {29FFBEDC-19F5-4899-BCB3-05EE2C7816C8} Anthemis gaudiumsolis Velen.

Page 8: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

GUIDs are vital to TDWG architecture

Page 9: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Which GUID

• GUID Subgroup Recommendations:• Use LSIDs for identifying biodiversity data• Reuse GUIDs where they already exist

– GUID type

– Existing assignments

• See GUID Report - http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1

Also Canberra LSID Workshop report:http://www.tdwg.org/fileadmin/subgroups/guid/LSID_policy_workshop_Report_Canberra.pdf

Page 10: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

What is an LSID?

• Life Science IDentifier• Developed by The Object Management Group &

W3C• Implemented by the team at IBM• Used for – data objects, datasets, images, files

Page 11: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

LSID Format urn:lsid:bioguid.org:taxon:1122:v1

• Prefix - indicates that this is a URN

• URN type - indicates that it’s an LSID-type urn

• Authority - the authority who issued the LSID

• Namespace - internal to that authority

• Object identifier - within that authority

• Version - optional

Page 12: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

LSID Rules

• Data doesn’t change (byte identical)

• Always available for resolution– Hand over to another authority if necessary

• At least some basic metadata

Page 13: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Pros of LSIDs

Not tied to physical addresses (as URLs are) Comparison can be done without resolving the ID

– eg for cases like “does object a = object b” Do not require any central registration or central

service Quick to adopt Encourage thought and planning before they are

allocated

Page 14: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Cons of LSIDs

However …

Requires DNS SRV record

Requires specialised software to resolve an LSID (not built in to most software)

The restriction - “LSID data cannot change” can be difficult

Page 15: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

How

• What data/objects to apply Ids to

• Decide on – Authority– Namespace– Local ids (new vs existing)

• Issue LSIDs

• Setup resolver

Page 16: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

LSID Code

• Current Code Stacks– Open Source (sourceforge.net)– Java, C++, Perl (IBM)– Microsoft .NET (Myself)– TAPIR LSID configuration

Page 17: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

LSID Tools

• IBM LSID Launchpad• Firefox LSID Browser• LSID Tester (Rod Page)• Web based resolver – http://lsid.tdwg.org/

http://lsid.tdwg.org/urn:lsid... to get LSID metadata http://lsid.tdwg.org/summary/urn:lsid... to get summary info of LSID object

• Example LSID servers:– Index Fungorum - urn:lsid:indexfungorum.org:names:213649 – IPNI – urn:lsid:ipni.org:names:30000959-2:1.1.2.1– uBio - urn:lsid:ubio.org:namebank:11815

Page 18: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Issues to think about

• Who assigns new LSIDs?

• Who maintains LSID resolvers?

• What to assign LSIDs to:– Physical or Digital– Granularity– Only objects that need to be resolved /

identified externally– Is there any data, or only metadata?

Page 19: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

Issues to think about

• When to resolve LSIDs– Every time an LSID is encountered, or only

when a client requests it?

• TDWG standards for metadata– Which ones?– Consistent application

Page 20: Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008

References• LSID Source Forge - http://lsids.sourceforge.net/

• LSID .NET Source Forge - http://sourceforge.net/projects/lsid-dotnet

• LSID Tutorial - http://www-128.ibm.com/developerworks/opensource/library/os-lsid/

• LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01

• LSID Tester - http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/

• LSID Launchpad - http://www-124.ibm.com/developerworks/downloads/detail.php?group_id=124&what=rele&id=553

• GUID Subgroup - http://www.tdwg.org/activities/guid/

• GUID Subgroup Reports

– http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1

– http://wiki.tdwg.org/twiki/pub/TIP/TipDocuments/GUID1Report.pdf

• Firefox LSID developer site - http://lsid.mozdev.org/