data sharing practices: implications for curation and re-use carole l. palmer & tiffany chao...
TRANSCRIPT
Data Sharing Practices:Implications for Curation and Re-use
Carole L. Palmer & Tiffany Chao
Center for Informatics Research in Science & ScholarshipGraduate School of Library & Information Science
University of Illinois at Urbana-Champaign
GSLIS Research Showcase30 March 2012
Team members:
Carole PalmerTiffany Chao
Nic WeberKaren Baker Andrea Thomer
• small science
• complex, heterogeneous data
• implications for data curation
• value for re-use across disciplines
- Data Practices team
Comparative analysis of researchers in the earth and life sciences
• Qualitative analysis of worksheets and interviews conducted with scientists.
• Investigation of data production and use in relation to curation needs, cultures of sharing, and re-use potential.
Field
Specific Research
AreaForm to be
shared Formats
Type of data set Size
Shared when?
Agronomy
water quality, drainage, and plant growth
cleaned, reviewed sensor;
hand-collected samples .xls
approx. 100 files
~1MB each, up to 20 Mb
After publication
Geology
rock, water and microbes
averaged sensor;
hand-collected samples; photographs .xls; jpg
1 file; images
< 1 Mb
After publication
Civil Engineering
traffic movement
cleaned, normalized sensor
MySQL postgresql
1 database
approx. 1000 K/day
1 month to 1 year embargo
Curation Profiles Project: What can be shared when?
Production vs. reuse / wholes and partsGeobiology Volcanology Soil ecology Sensor science
Data unit
Time series: (site specific)
• spreadsheets• microscopy images• annotated digital “field photos”
Rock profile:
• physical rock• thin section• chemical analysis• photographs• field notes
Database:
• multiple abiotic soil measures• associated metadata
Database:
• soil data• sensor data
User communities
Geobiology, Geology, Chemistry,MicrobiologyU.S. Park Service
Geology – igneous petrologyGeophysicsGeochemistry
Biochemistry Earthworm ecology
Network Science Computer Science
Sharingconventions
• by request • no repository• mostly post-pub, some unpublished
• by request• no repository
• public resource collection
• Reference data industry• Limits – customization “vertical” dev.
Far from collective, shared data infrastructure
Curation of functional groupings:
Exposing data very different from supplying by request. Complex mis-use concerns:
• Misinterpretation – presumed problems• Misappropriation – actual premature re-use• Disregard of good faith practices
– how used, what referenced
• Scholarly record of data collected and analyzed• Unit for long-term preservation• Organization for retrieval• Raw material for future research