829 tdwg-2015-nicolson-kew-strings-to-things

Post on 10-Apr-2017

500 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Strings to things: a user-friendly framework for data reconciliation

Nicky Nicolson, RBG Kew@nickynicolson

Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015

Reconciliation

• Turns a string representation of an entity into an actionable identifier.

e.g.:Tahina spectabilis

Will reconcile to:http://

ipni.org/urn:lsid:ipni.org:names:77086615-1

Maximise reuse, two stage process1. Standardise data

- Package of 40 plus “transformers”- All accept a string input, produce a string

output

Examples of transformers

Open Refine screenshot

Open source

http://github.com/RBGKew/StringTransformers

Maximise reuse, two stage process2. Match the data

- Package of 20 plus “matchers”- All accept two inputs and return a flag if they

match

Configuring a service

1) Read tabular data (file or DB)2) Configure transformers3) Configure matchers

Run it…

1) Service description2) Three service endpoints3) Javascript query interface

IPNI Reconciliation Service

3 service endpoints

IPNI Reconciliation Service

Flexible web service

• Open Refine compatible• But underneath it’s JSON over HTTP• … so call it from any programming language

Service metadata

Service call

Service response

Open source

https://github.com/RBGKew/Reconciliation-and-Matching-Framework

What we’ll work on in the future

Reconciliation services on different data types

• Specimens– Add DwCA as a readable data store– Collections focussed transformers & matchers– Resolve & link specimen duplicates

• People• Trait glossaries

Integration with github

Thanks to:• Biodiversity Informatics team (Abigail Barker,

Matt Blissett, James Crowe, John Iacona, Rob Turner, Alecs Gueder)

• Plant & fungal name curation team (Christine Barker / Irina Belyaeva / Katherine Challis / Rafael Govaerts / Paul Kirk / Heather Lindon / Emma Williams)

• Data improvement team (Anna Lynch, Rachel Witherow, Malin Rivers, Esther Wainwright-Deri)

@nickynicolson / n.nicolson@kew.org

http://bit.ly/k-names-service

http://github.com/RBGKew

Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015

top related