digibird: on the fly collection integration using crowdsourcing

16
Chris Dijkshoorn On the fly collection integration supported by the crowd

Upload: vu-university-amsterdam

Post on 15-Apr-2017

293 views

Category:

Software


1 download

TRANSCRIPT

Chris Dijkshoorn

On the fly collection integration supported by the crowd

‣ Crowdsourcing tasks are undertaken in isolation

‣ It takes time to collect data

‣ It demands continuous promotional effort

‣ It is challenging for institutions to incorporate the results of crowdsourcing into their existing infrastructure

Crowdsourcing Challenges

valorisation project

May 2016 to November 2016

DigiBird project

Chris Dijkshoorn Cristina Bucur Lora Aroyo

Maarten Brinkerink Sander Pietersen Saskia Scheltjens

Crowdsourced collections

Collections

Crowdsourced metadata

Sounds Artworks Images Videos

‣ Every institution has its own system

‣ No visibility similar initiatives

DigiBird solution

‣ Create a hub

‣ Provide on the fly integration

‣ Use a shared vocabulary

Challenge 1: Crowdsourcing tasks are undertaken in isolation

Why use vocabulary terms instead of text?

Why use vocabulary terms instead of text?

Grote trap

Thesauri can bridge collection

IOC World Bird List

‣ 33,801 terms

‣ Structured using Simple Knowledge Organization System (SKOS)

‣ (Semi) persistent identifiers

Importance shared vocabulary

Goals

‣ Make results available on the fly

‣ Provide insights in progress

DigiBird pipeline

Data retrieval

Request formulation

Data integration

Response formulation

Query filter Merel

Request search Merel

Request parameter Turdus merula

Query concept ioc:Turdus_merula

DigiBird pipeline example: retrieve information about a blackbird

- ===

-

rec ===

dc:creator

creator ===

dc:creator

creator ===

dc:creator

JSON result list SPARQL result list SPARQL result list

Return JSON, JSON-LD, N-Quads or Turtle

JSON result list

‣ Crowdsourcing relies on voluntary contributions

‣ Unpredictable when people will contribute

How DigiBird helps

‣ Monitor progress

Challenge 2: It takes time to collect data

‣ Organise events

‣ Market initiatives

DigiBird solution

‣ Generate challenging tasks (2.0?)

Challenge 3: It demands continuous promotional effort

‣ Data siloes

‣ Trust in data

DigiBird solutions

‣ Provide a way to directly access data

‣ Different output formats

‣ Refine and review contributions (2.0?)

Challenge 4: It is challenging for institutions to incorporate the results of crowdsourcing into their existing infrastructure

Monitoring

Species view

Annotation wall

Source code is available

‣ https://github.com/rasvaan/digibird_api

‣ https://github.com/rasvaan/digibird_client

DigiBird website

‣ Use standardised vocabularies

‣ Get persistent identifiers

‣ Document how to access your data

‣ Realise effort is required to create a mature codebase

‣ Some code does not age well

How to make the life of a programmer easier