data stewardship for spatial/isocamp 2014

89
Data Stewardship Carly Strasser California Digital Library [email protected] SPATIAL / IsoCamp June 2014 Tips & Tools

Upload: carly-strasser

Post on 16-Apr-2017

1.091 views

Category:

Science


6 download

TRANSCRIPT

Page 1: Data Stewardship for SPATIAL/IsoCamp 2014

Data Stewardship

Carly Strasser California Digital Library [email protected] SPATIAL / IsoCamp

June 2014

Tips & Tools

Page 2: Data Stewardship for SPATIAL/IsoCamp 2014

From

Flic

kr v

ia lib

raria

ninst

a.tu

mbl

r.com

I am not a librarian. But I do work at a library.

Page 3: Data Stewardship for SPATIAL/IsoCamp 2014
Page 4: Data Stewardship for SPATIAL/IsoCamp 2014

Enable data sharing Encourage

new incentives

Think about code sharing

Work with libraries, publishers and

researchers

Explore new tools to help

change system

Build tools

Page 5: Data Stewardship for SPATIAL/IsoCamp 2014

Why are you here?

Science: you’re (probably) doing it wrong

Page 6: Data Stewardship for SPATIAL/IsoCamp 2014

Back in the day…

Da Vinci

Curie Newton

classicalschool.blogspot.com

Darwin

Page 7: Data Stewardship for SPATIAL/IsoCamp 2014

Research has changed

Better

Page 8: Data Stewardship for SPATIAL/IsoCamp 2014

From wikimedia

Such Internet!

So many tools!

From Flickr by John Jobby

So much data!

Page 9: Data Stewardship for SPATIAL/IsoCamp 2014

Research has changed Worse

Page 10: Data Stewardship for SPATIAL/IsoCamp 2014

Digital data Fr

om F

lickr

by

Flick

mor

From

Flic

kr b

y US

Arm

y En

viron

men

tal C

omm

and

From

Flic

kr b

y D

W08

25

C. Strasser

Cour

tese

y of

WHO

I

From

Flic

kr b

y d

eltaM

ike

Page 11: Data Stewardship for SPATIAL/IsoCamp 2014

Digital data +

Complex workflows

Page 12: Data Stewardship for SPATIAL/IsoCamp 2014

Scientists are bad at data management.

Page 13: Data Stewardship for SPATIAL/IsoCamp 2014

An embarrassing example…

From Flickr by lincolnblues

Page 14: Data Stewardship for SPATIAL/IsoCamp 2014
Page 15: Data Stewardship for SPATIAL/IsoCamp 2014
Page 16: Data Stewardship for SPATIAL/IsoCamp 2014

?

Page 17: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by ransomtech

Didn’t share the data Didn’t document the data (metadata) Didn’t document provenance/workflow

Page 18: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by ransomtech

Reproducibility Transparency Reuse NO

Page 19: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by johntrainor

Why should I care?

Page 20: Data Stewardship for SPATIAL/IsoCamp 2014

Because reproducibility* is one of the fundamental tenets of science. *reproducibility: being able to go from data to figures/results

not reproducibility: independently verifiable via following same techniques.

Page 21: Data Stewardship for SPATIAL/IsoCamp 2014

Because reproducibility is one of the fundamental tenets of science.

Because we need to be credible.

Page 22: Data Stewardship for SPATIAL/IsoCamp 2014
Page 23: Data Stewardship for SPATIAL/IsoCamp 2014

Because reproducibility is one of the fundamental tenets of science.

Because we need to be credible.

Because Fox News, creationism, and the war on science.

Page 24: Data Stewardship for SPATIAL/IsoCamp 2014

“Help us identify grants that are wasteful or that you don’t think are a good use of taxpayer dollars.” ! Rep. Adrian Smith (R-Nebraska), a member of the House Committee on Science and Technology

Page 25: Data Stewardship for SPATIAL/IsoCamp 2014

Because reproducibility is one of the fundamental tenets of science.

Because we need to be credible.

Because Fox News, creationism, and the war on science

Because it means faster progress.

Page 26: Data Stewardship for SPATIAL/IsoCamp 2014
Page 27: Data Stewardship for SPATIAL/IsoCamp 2014

Because you are a good person.

Page 28: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by Redden-McAllister

From Flickr by Ken Cowell

From Flickr Brandi Jordan

Page 29: Data Stewardship for SPATIAL/IsoCamp 2014

Open Science Making data research dissemination

available to all

Page 30: Data Stewardship for SPATIAL/IsoCamp 2014

flowingdata.com

Map of Scientific Collaborations

Page 31: Data Stewardship for SPATIAL/IsoCamp 2014

Because you have to.

Page 32: Data Stewardship for SPATIAL/IsoCamp 2014

Journals Institutions Funders From Flickr by Eva Rinaldi Celebrity and Live Music

Photographer

Page 33: Data Stewardship for SPATIAL/IsoCamp 2014

… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”

Feb 2013

Page 34: Data Stewardship for SPATIAL/IsoCamp 2014

1.  Maximize free public access 2.  Ensure researchers create data

management plans 3.  Allow costs for data preservation and

access in proposal budgets 4.  Ensure evaluation of data management

plan merits 5.  Ensure researchers comply with their data

management plans 6.  Promote data deposition into public

repositories 7.  Develop approaches for identification and

attribution of datasets 8.  Educate folks about data stewardship

From Flickr by Joe Crimmings Photography

Page 35: Data Stewardship for SPATIAL/IsoCamp 2014

From  Flickr  by  Michael  Tinkler  

Page 36: Data Stewardship for SPATIAL/IsoCamp 2014

data management

From

Flic

kr b

y Bi

g Sw

ede

Guy

Best Practices

Page 37: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by Mark Sardella

Plan before data collection

Page 38: Data Stewardship for SPATIAL/IsoCamp 2014

•  Create a key (data dictionary) •  Make sure names are unique •  Define codes

From

Flic

kr b

y ze

bbie

Planning Design sample naming scheme

Page 39: Data Stewardship for SPATIAL/IsoCamp 2014

PhDcomics.com

Planning Design file naming scheme

Page 40: Data Stewardship for SPATIAL/IsoCamp 2014

Use descriptive file names •  Unique •  Reflect contents

From  R  Cook,  ESA  Best  Practices  Workshop  2010  

Bad: Mydata.xls 2001_data.csv best version.txt

Better: Eaffinis_nanaimo_2010_counts.xls

Site name

Year What was measured

Study organism

*Not for everyone

*

Planning Design file naming scheme

Page 41: Data Stewardship for SPATIAL/IsoCamp 2014

Biodiversity

Lake

Experiments

Field work

Grassland

Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv …

From S. Hampton

Planning Design file organization

Consider… •  Dependencies? •  File formats? •  Time of collection? •  Order of analysis?

Page 42: Data Stewardship for SPATIAL/IsoCamp 2014

Planning

Constrain entries Atomize Break down spreadsheets

Design your spreadsheet

Page 43: Data Stewardship for SPATIAL/IsoCamp 2014

A relational database is A set of tables Relationships among the tables A language to specify & query the tables

A RDB provides

Scalability: millions+ records Features for sub-setting, querying, sorting Reduced redundancy & entry errors

From Mark Schildhauer

Planning Consider a database

Page 44: Data Stewardship for SPATIAL/IsoCamp 2014

You should invest time in learning databases if your data sets are large or complex

Consider investing time in learning databases if your data are small and humble you ever intend to share your data you are < 30 years old

Planning

From Mark Schildhauer

Consider a database

Page 45: Data Stewardship for SPATIAL/IsoCamp 2014

Store your data in a repository Institutional archive

Discipline/specialty archive

Pick a data repository

From Flickr by torkildr

Ask a librarian

Repos of repos: databib.org re3data.org

Planning

Page 46: Data Stewardship for SPATIAL/IsoCamp 2014

From

Flic

kr b

y se

pa s

ynod

From Flickr by taberandrew

From Flickr by withassociates

What software? What hardware? What personnel?

How often? Set up reminders!

Test system

Decide on preservation/backup Planning

Page 47: Data Stewardship for SPATIAL/IsoCamp 2014

…document that describes what you will

do with your data throughout

the research project

From Flickr by Barbies Land

Write a data management plan!

Planning

Page 48: Data Stewardship for SPATIAL/IsoCamp 2014

DMP components

But they all have different requirements and express them in

different ways

•  What will be collected •  Methods •  Standards •  Metadata •  Sharing/access •  Long-term storage

Planning

From Flickr by Barbies Land

Page 49: Data Stewardship for SPATIAL/IsoCamp 2014

Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community

dmptool.org Planning

Page 50: Data Stewardship for SPATIAL/IsoCamp 2014

During Data Collection & Entry

From Flickr by Julia Manzerova

Page 51: Data Stewardship for SPATIAL/IsoCamp 2014

Realistically: •  Archive .csv version of raw data •  Make a “raw” tab in working data file •  Do all work on other tabs

During collection Keep raw data raw

Page 52: Data Stewardship for SPATIAL/IsoCamp 2014

Raw data as .csv

R script for processing & analysis

During collection

Ideally: •  Use scripts to process data •  Save them with data

Keep raw data raw

Page 53: Data Stewardship for SPATIAL/IsoCamp 2014

During collection Document your workflow

Temperature data

Salinity data

Data import into Excel

Analysis: mean, SD

Graph production

Quality control & data cleaning “Clean” T

& S data

Summary statistics

Data in spread-sheet

Workflow: how you get from the raw data to the final products of your research

Simple workflow: flow chart

Page 54: Data Stewardship for SPATIAL/IsoCamp 2014

During collection

Workflow: how you get from the raw data to the final products of your research

Simple workflow: commented script

•  R, SAS, MATLAB… •  Well-documented code is

Easier to review Easier to share Easier to use for repeat analysis

# % $

&

Document your workflow

Page 55: Data Stewardship for SPATIAL/IsoCamp 2014

Fancy schmancy workflows Resulting output

https://kepler-project.org

During collection Document your workflow

Page 56: Data Stewardship for SPATIAL/IsoCamp 2014

Workflows enable •  Reproducibility •  Transparency •  Reuse

From Flickr by merlinprincesse

During collection Document your workflow

Page 57: Data Stewardship for SPATIAL/IsoCamp 2014

Constrain data entries •  Excel lists •  Data validation •  Google docs forms

Modified from K. Vanderbilt

During collection

Page 58: Data Stewardship for SPATIAL/IsoCamp 2014

Atomize During collection

One piece of information per cell

Page 59: Data Stewardship for SPATIAL/IsoCamp 2014

Create parameter table

From doi:10.3334/ORNLDAAC/777

From doi:10.3334/ORNLDAAC/777

From R Cook, ESA Best Practices Workshop 2010

During collection Break down spreadsheets

Fake a relational database

Create a site table

Page 60: Data Stewardship for SPATIAL/IsoCamp 2014

Why are you promoting

Excel?

During collection Create metadata

Page 61: Data Stewardship for SPATIAL/IsoCamp 2014

Metadata: data reporting

WHO created the data? WHAT is the content

of the data set? WHEN was it created? WHERE was it collected? HOW was it developed? WHY was it developed?

From

Flic

kr b

y /\

/\ich

ael P

atric

|{

During collection Create metadata

Page 62: Data Stewardship for SPATIAL/IsoCamp 2014

Digital context •  Name of the data set •  The name(s) of the data file(s) in the

data set •  Date the data set was last modified •  Example data file records for each data

type file •  Pertinent companion files •  List of related or ancillary data sets •  Software (including version number)

used to prepare/read the data set •  Data processing that was performed Personnel & stakeholders •  Who collected •  Who to contact with questions •  Funders

Scientific context •  Scientific reason why the data were

collected •  What data were collected •  What instruments (including model & serial

number) were used •  Environmental conditions during collection •  Temporal & spatial resolution •  Standards or calibrations used

Information about parameters •  How each was measured or produced •  Units of measure •  Format used in the data set •  Precision & accuracy if known

Information about data •  Definitions of codes used •  Quality assurance & control measures •  Known problems that limit data use (e.g.

uncertainty, sampling problems)

During collection Create metadata

Page 63: Data Stewardship for SPATIAL/IsoCamp 2014

•  Provide structure to describe data Common terms | definitions | language | structure

•  Come in many flavors EML , FGDC, ISO19115, DarwinCore,…

•  Can be met using software tools Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)

What is metadata?

Metadata standards…

During collection

Standard < Create metadata

Page 64: Data Stewardship for SPATIAL/IsoCamp 2014

Back up daily During collection

From Flickr by lippo

From Flickr by see phar

Original Near

Far

Page 65: Data Stewardship for SPATIAL/IsoCamp 2014

During collection

From Flickr by Barbies Land

Remember that data management plan?

Revisit Review Revise

Page 66: Data Stewardship for SPATIAL/IsoCamp 2014

During collection

Schedule a time each week or month

Revisit Review Revise

From Flickr by purplemattfish

Page 67: Data Stewardship for SPATIAL/IsoCamp 2014

From

 Flickr  by  celikins  

Where to start?

Page 68: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by Andy Graulund

Make a resolution • Triage on current

projects • Get advisor, lab mates,

collaborators on board • Do better next time

Page 69: Data Stewardship for SPATIAL/IsoCamp 2014

Start working online

From  Flickr  by  karindalziel  

Page 70: Data Stewardship for SPATIAL/IsoCamp 2014

http://datapub.cdlib.org

Reproducibility, E-notebooks, Online science

Page 71: Data Stewardship for SPATIAL/IsoCamp 2014

Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community

dmptool.org Write a DMP

Page 72: Data Stewardship for SPATIAL/IsoCamp 2014

databib.org

Where should I put my data?

Find a repository

Page 73: Data Stewardship for SPATIAL/IsoCamp 2014

Get help

From

Flic

kr b

y th

ewm

att

Page 74: Data Stewardship for SPATIAL/IsoCamp 2014

From

Flic

kr b

y No

rth C

arol

ina D

igita

l He

ritag

e Ce

nter

From Flickr by Madison Guy

Get help from your library

Page 75: Data Stewardship for SPATIAL/IsoCamp 2014

Learn new skills software carpentry www.software-carpentry.org

Page 76: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by Micah Taylor

Other Fun Stuff

Page 77: Data Stewardship for SPATIAL/IsoCamp 2014

Altmetrics?

Impact Factors

+ Citation Counts

Credit in academia…

Page 78: Data Stewardship for SPATIAL/IsoCamp 2014

Altmetrics Article-level metrics Altmetrics for alt-products

Data Code Slides Blogs

Downloads Tweets

Mentions Views

From Flickr by Skakerman

Page 79: Data Stewardship for SPATIAL/IsoCamp 2014

Altmetrics Article-level metrics Altmetrics for alt-products

Page 80: Data Stewardship for SPATIAL/IsoCamp 2014

Researcher  Identification  

Page 81: Data Stewardship for SPATIAL/IsoCamp 2014

BIG initiatives…

Page 82: Data Stewardship for SPATIAL/IsoCamp 2014

NSF funded DataNet Project Office of Cyberinfrastructure

www.dataone.org

Page 83: Data Stewardship for SPATIAL/IsoCamp 2014
Page 84: Data Stewardship for SPATIAL/IsoCamp 2014

New partners…

Page 85: Data Stewardship for SPATIAL/IsoCamp 2014

Better methods…

Page 86: Data Stewardship for SPATIAL/IsoCamp 2014

Better methods…

Page 87: Data Stewardship for SPATIAL/IsoCamp 2014

Science is changing.

Embrace it.

Page 88: Data Stewardship for SPATIAL/IsoCamp 2014

From Flickr by dotpolka

Manage & share your data!

Page 89: Data Stewardship for SPATIAL/IsoCamp 2014

Website Email

Twitter Slides

carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser