annotating research datasets
DESCRIPTION
A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging. The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.TRANSCRIPT
Annota&ng Research Datasets
11 A p r i l 2 0 1 3
Un i ve r s i t y o f C a l i fo r n i a Cu ra&on Cen te r
C a l i fo r n i a D i g i t a l L i b ra r y
Term skew
Annota&on: The act of adding a note by way of comment or explana&on.
Genome annota&on: The process of aFaching biological informa&on to sequences. E.g.,
• Protein Data Bank annota&on manual: 247 pgs
Research data annota&on: (?!) Adding to opaque data to make it visible, sensible, and valuable.
The Long Tail
Size of dataset
# datasets
The Long Tail
Size of dataset
# researchers
# datasets
The Long Tail
Size of dataset
# researchers
# datasets
# grants
The Long Tail
Size of dataset
# researchers
# datasets
# grants
grant ($)
The Long Tail
Size of dataset
# researchers
# datasets
# grants
grant ($)
With data managers and fancy tools
Do-‐it-‐yourself tools
From Flickr By puck90
UGLY TRUTH
Many researchers… have limited funding for data services
are not taught data management
don’t know what metadata or data centers are
don’t share data publicly or store it in an archive
aren’t convinced they should share data
The research data problem
• Journal article
– Uniquely and persistently identified
– Concept of “publish”
– Multiple copies
– Easily findable
– Impact metrics, etc.
– Curation funding
• Research data
– Nope
– Not really
– Typically one
– Difficult
– Nope
– Barely
Research data is ripe for crowd-sourced annotation