829 tdwg-2015-nicolson-kew-strings-to-things

25
Strings to things: a user- friendly framework for data reconciliation Nicky Nicolson, RBG Kew @nickynicolson Biodiversity Information Standards (TDWG) annual meeting Nairobi, Kenya / 28th September – 1 October 2015

Upload: nickyn

Post on 10-Apr-2017

500 views

Category:

Science


0 download

TRANSCRIPT

Page 1: 829 tdwg-2015-nicolson-kew-strings-to-things

Strings to things: a user-friendly framework for data reconciliation

Nicky Nicolson, RBG Kew@nickynicolson

Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015

Page 2: 829 tdwg-2015-nicolson-kew-strings-to-things

Reconciliation

• Turns a string representation of an entity into an actionable identifier.

e.g.:Tahina spectabilis

Will reconcile to:http://

ipni.org/urn:lsid:ipni.org:names:77086615-1

Page 3: 829 tdwg-2015-nicolson-kew-strings-to-things
Page 4: 829 tdwg-2015-nicolson-kew-strings-to-things

Maximise reuse, two stage process1. Standardise data

- Package of 40 plus “transformers”- All accept a string input, produce a string

output

Page 5: 829 tdwg-2015-nicolson-kew-strings-to-things

Examples of transformers

Page 6: 829 tdwg-2015-nicolson-kew-strings-to-things

Open Refine screenshot

Page 7: 829 tdwg-2015-nicolson-kew-strings-to-things

Open source

http://github.com/RBGKew/StringTransformers

Page 8: 829 tdwg-2015-nicolson-kew-strings-to-things

Maximise reuse, two stage process2. Match the data

- Package of 20 plus “matchers”- All accept two inputs and return a flag if they

match

Page 9: 829 tdwg-2015-nicolson-kew-strings-to-things

Configuring a service

1) Read tabular data (file or DB)2) Configure transformers3) Configure matchers

Page 10: 829 tdwg-2015-nicolson-kew-strings-to-things

Run it…

1) Service description2) Three service endpoints3) Javascript query interface

Page 11: 829 tdwg-2015-nicolson-kew-strings-to-things

IPNI Reconciliation Service

Page 12: 829 tdwg-2015-nicolson-kew-strings-to-things

3 service endpoints

Page 13: 829 tdwg-2015-nicolson-kew-strings-to-things

IPNI Reconciliation Service

Page 14: 829 tdwg-2015-nicolson-kew-strings-to-things

Flexible web service

• Open Refine compatible• But underneath it’s JSON over HTTP• … so call it from any programming language

Page 15: 829 tdwg-2015-nicolson-kew-strings-to-things

Service metadata

Page 16: 829 tdwg-2015-nicolson-kew-strings-to-things

Service call

Page 17: 829 tdwg-2015-nicolson-kew-strings-to-things

Service response

Page 19: 829 tdwg-2015-nicolson-kew-strings-to-things

Open source

https://github.com/RBGKew/Reconciliation-and-Matching-Framework

Page 20: 829 tdwg-2015-nicolson-kew-strings-to-things

What we’ll work on in the future

Page 21: 829 tdwg-2015-nicolson-kew-strings-to-things

Reconciliation services on different data types

• Specimens– Add DwCA as a readable data store– Collections focussed transformers & matchers– Resolve & link specimen duplicates

• People• Trait glossaries

Page 22: 829 tdwg-2015-nicolson-kew-strings-to-things

Integration with github

Page 23: 829 tdwg-2015-nicolson-kew-strings-to-things
Page 24: 829 tdwg-2015-nicolson-kew-strings-to-things

Thanks to:• Biodiversity Informatics team (Abigail Barker,

Matt Blissett, James Crowe, John Iacona, Rob Turner, Alecs Gueder)

• Plant & fungal name curation team (Christine Barker / Irina Belyaeva / Katherine Challis / Rafael Govaerts / Paul Kirk / Heather Lindon / Emma Williams)

• Data improvement team (Anna Lynch, Rachel Witherow, Malin Rivers, Esther Wainwright-Deri)

Page 25: 829 tdwg-2015-nicolson-kew-strings-to-things

@nickynicolson / [email protected]

http://bit.ly/k-names-service

http://github.com/RBGKew

Biodiversity Information Standards (TDWG) annual meetingNairobi, Kenya / 28th September – 1 October 2015