annotating research datasets

9
Annota&ng Research Datasets 11 April 2013 University of California Cura&on Center California Digital Library

Upload: john-kunze

Post on 11-May-2015

208 views

Category:

Technology


0 download

DESCRIPTION

A huge amount of incredibly diverse research data remains beyond the reach of internet search engines, peer review processes, and systematic cataloging.  The ability by consumers to annotate data is an important mitigation, harnessing "the crowd" to make it easier for everyone to discover and re-use data.

TRANSCRIPT

Page 1: Annotating Research Datasets

Annota&ng  Research  Datasets  

11   A p r i l   2 0 1 3  

Un i ve r s i t y   o f   C a l i fo r n i a   Cu ra&on   Cen te r  

C a l i fo r n i a  D i g i t a l   L i b ra r y  

Page 2: Annotating Research Datasets

Term  skew  

Annota&on:  The  act  of  adding  a  note  by  way  of  comment  or  explana&on.  

Genome  annota&on:  The  process  of  aFaching  biological  informa&on  to  sequences.    E.g.,  

•  Protein  Data  Bank  annota&on  manual:  247  pgs  

Research  data  annota&on:  (?!)  Adding  to  opaque  data  to  make  it  visible,  sensible,  and  valuable.  

Page 3: Annotating Research Datasets

The  Long  Tail  

Size  of  dataset  

#  datasets  

Page 4: Annotating Research Datasets

The  Long  Tail  

Size  of  dataset  

#  researchers  

#  datasets  

Page 5: Annotating Research Datasets

The  Long  Tail  

Size  of  dataset  

#  researchers  

#  datasets  

#  grants  

Page 6: Annotating Research Datasets

The  Long  Tail  

Size  of  dataset  

#  researchers  

#  datasets  

#  grants  

grant  ($)  

Page 7: Annotating Research Datasets

The  Long  Tail  

Size  of  dataset  

#  researchers  

#  datasets  

#  grants  

grant  ($)  

With  data  managers  and  fancy  tools  

Do-­‐it-­‐yourself  tools  

Page 8: Annotating Research Datasets

From  Flickr  By    puck90  

UGLY   TRUTH  

Many  researchers…  have  limited  funding  for  data  services  

are  not  taught  data  management  

don’t  know  what  metadata  or  data  centers  are  

don’t  share  data  publicly  or  store  it  in  an  archive  

aren’t  convinced  they  should  share  data  

Page 9: Annotating Research Datasets

The research data problem  

•  Journal article

–  Uniquely and persistently identified

–  Concept of “publish”

–  Multiple copies

–  Easily findable

–  Impact metrics, etc.

–  Curation funding

•  Research data

–  Nope

–  Not really

–  Typically one

–  Difficult

–  Nope

–  Barely

Research data is ripe for crowd-sourced annotation