how to share your data · 2020-02-20 · open data commons attribution license you are free: to...

52
How to Share Your Data Lisa Spiro February 2020

Upload: others

Post on 24-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

How to Share Your Data

Lisa SpiroFebruary 2020

Page 2: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Data Sharing Sped Human Genome ProjectBermuda Principles (1996): “all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.”

https://www.genome.gov/about-nhgri/Policies-Guidance/Genomic-Data-Sharing/data-submission

Page 3: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

I. Why Share Data?

Page 4: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Overview of Data Sharing

● Benefits researchers● Benefits research fields

● Benefits society

https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/open-data

Page 5: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Conform to Journal Requirements

● PLOS: “require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication”

● Science: data must be available ● SAGE: encourages, expects, or requires data sharing● SpringerNature: encourages, expects, or requires● Wiley: encourages, expects, or mandates● American Geophysical Union: make data available whenever possible

Page 6: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Meet Funding Agency Requirements

Most federal agencies require data management plans that include a section on data access, such as:● NIH● NSF● NOAA● DOE

Page 7: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Increase Your Own Citations“a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.” (Christensen et al, 2019)

Articles with data availability statements and URLs for data “can have up to 25.36% higher citation impact on average” (Colavizza et al, 2019 [preprint])

Page 8: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Benefit Research Communities

● Support future research studies-Enables new insights to be generated from existing datasets  -Supports larger scale studies (e.g. astronomy, ecology)-Lowers costs-Speeds research-Sparks new collaborations

● Provide resources to be used in teaching

Page 9: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Facilitate Reproducibility

● “Reproducibility crisis” (especially in psychology) raises serious questions about the credibility of research 

● Sharing data allows others-researchers to present evidence for their results-others to use data and methods

Page 10: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Facilitate Data Quality

● Data sharing policies may discourage use of fraudulent or flawed data.

● “With raw data we have the possibility to find and correct mistakes. On top of that, the probability of making a mistake is likely to be lower once you have gone to the effort of archiving your data in such a way that another person can understand it.” (Nuijten)

Page 11: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

To Preserve to Data

● Offload responsibilities for providing access to and preserving data to professionals

● Future You will be grateful…

Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark

Page 12: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Activity: Data Sharing Policies

Look up the data sharing policy for a funding agency or journal/ publisher in your discipline. 

What are the requirements for data sharing (if any)?

What justification (if any) do they use for the policy?

Page 13: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

II. Concerns about Sharing Data

Page 15: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Concerns About Misuse of Data

● Licenses can be attached to data.● Scientific norms such as citation still govern data usage.

● Data can be shared through a repository such as ICPSR that places restrictions on sensitive data.

Page 16: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Fear of Getting Scooped

● If your data is used, it will be cited.

● You can embargo data, delaying its release. 

https://rd-alliance.org/plenary-meetings/fourth-plenary/plenary-cartoons.html

Page 17: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Not Receiving Appropriate Acknowledgement● Make sure your dataset has a DOI (encouraged by NSF)

● Make it easy to cite using data citation standards, such as DataCite:

Creator (PublicationYear). Title. Publisher. Identifier

Example: Barclay, Janet Rice (2013) Stream Discharge from Harford, NY. Cornell University Library eCommons Repository. http://hdl.handle.net/1813/34425

Page 18: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Costs of Sharing Data

● Data costs can be written into most grants.NSF: data deposit fees for repositories “are allowable expenses in proposal and award budgets.” 

● Many repositories are free (although there is certainly time involved in sharing data). 

Page 19: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Lack of Time to Deposit Data

● Organizing and describing data does take time.● But practicing good data management from the beginning can save you time in the long run. 

https://www.data.cam.ac.uk/intro-data-champions/data-champions-cartoons

Page 20: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Risk of Sharing Sensitive Information

● This is a legitimate concern. You should make sure that you aren’t sharing private or proprietary information; most data policies make exceptions for sharing this kind of data.

● But…You can protect confidentiality by removing or obscuring direct, indirect, and geographic identifiers.You should include data sharing in your IRB documents.You can use repositories like ICPSR for restricted use data.

Page 21: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Discussion: Pros and Cons of Data Sharing If a colleague asked you about whether to share their research data, what would you tell them?

Page 22: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

III. Where Can You Share Your Data?

Page 23: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Selecting a Research Data Repository● Research data repository: “a subtype of a sustainable information infrastructure which provides long-term storage and access to research data that is the basis for a scholarly publication. Research data means information objects generated by scholarly projects...” (Re3data)

● Why use a data repository rather than providing data on request:- Easier to discover- Easier to access- Built in support for preservation- Relieves researchers from long-term management of data

Page 24: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Finding Appropriate Repositories

● Consult lists of recommended repositoriesSpringerNaturePLOS Recommended Repositories

● Use directories like https://www.re3data.org/

Page 25: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Disciplinary Data Repositories

● Contain data from researchers in a particular research domain

● Examples: GenBank (genetics) or PANGEA (earth systems)

Advantages Disadvantages

More recognized by peers May limit what they accept

Metadata and data formats better for discipline

May be less visible to researchers in other areas

Page 26: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

General Data Repositories

● Not restricted by discipline ● Examples: Zenodo, FigShare or Harvard DataverseAdvantages Disadvantages

Recognized across disciplines; indexed by search engines

May be more difficult for researchers to find your data

Robust platforms for data publishing; good features

May lack specific metadata and data support for your discipline

Widely adopted May charge a fee or be tied to for-profit

Page 27: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Institutional Repository

● General repository associated with research institution

● Example: Rice Digital Scholarship ArchiveAdvantages Disadvantages

Associated with Rice May not carry same weight w/ peers

Assistance from library in depositing data

May lack metadata and data support for your discipline

Assurances of long-term support

User interface currently not so great

Page 28: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

IV. What criteria should you consider in selecting a data repository?

Page 29: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Make Your Research Data FAIR

https://library.cuni.cz/services/openaccess/open-research-data/

Page 30: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Selecting a Data Repository

https://blogs.lse.ac.uk/impactofsocialsciences/2013/11/29/how-to-find-an-appropriate-research-data-repository/

Page 31: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Questions to Ask in Evaluating a Data Repository

● How well will the data be preserved? How stable is the repository?

● What kind of reputation does the archive have in the community?

● Does the repository facilitate citation of the data? Does it offer DOIs?

Page 32: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

More Questions to Ask about Repositories● Does the repository allow you to describe the 

data fully and make it discoverable?● What features does it offer? Connections with 

GitHub? Download stats? Landing page for dataset?

● Are there curators who can help to deposit the data?

● What are the costs of deposit, if any?

Page 33: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Get Help Selecting a Repository

● Consult with the Research Data Services team● See Assessing General Data Repositories 

Page 34: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Activity

● Find an appropriate repository for your data.● Assess the repository using the criteria we’ve discussed.

● Would you use this repository?

Page 35: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

V. How to Prepare Your Data for Sharing

Page 36: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Compile Data to Be SharedShare data needed to replicate the study

Don’t share identifiable human subjects data, proprietary data, etc.

https://datadryad.org/docs/QuickstartGuideToDataSharing.pdf

Page 37: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Ensure That Files Are Well Organized and In Open Formats● Open formats (csv, txt, tiff, etc) facilitate long term usage

● Good organization enables future users (including you) to understand data --Meaningful filenames--Clear grouping of files

Page 38: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Document and Share Your DataTypical sections of a readme include:• Title• Author• Date• Citation of related study• Description• Restrictions• Methods• Codes & variables• File description

Page 40: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Selecting a License

● Since facts can’t be copyrighted, data can’t be, either.

● However, a database does have some intellectual property protections, such as how the data is organized, what data is included, etc.

● In any case, it’s best to explicit about reuse rights.

● More open licenses tend to be preferred because they facilitate reuse.

Page 41: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Creative Commons Zero

● Releases data into the public domain, removing barriers to use.

● Advantages:Human readableInteroperableUniversal

● Used by Dryad, Dataverse (default license), et al

Page 42: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Open Data Commons Attribution License● You are free:

To Share: To copy, distribute and use the database.To Create: To produce works from the database.To Adapt: To modify, transform and build upon the database.

● As long as you:Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the license.>>If you want to make sure that dataset is cited.

Page 43: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Open Data Commons Open Database License● You are free:

To Share: To copy, distribute and use the database.To Create: To produce works from the database.To Adapt: To modify, transform and build upon the database.

● As long as you:AttributeShare-Alike: ”…offer that adapted database under the ODbL.”Keep open>>If you want to guarantee that derivative datasets also remain open

Page 44: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Case Study: Sharing Data with Zenodo

Page 45: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the
Page 46: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the
Page 47: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the
Page 48: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Activity

● Describe the process of sharing data through a particular repository.

● Is the process straightforward? ● How would you prepare to share your data using this repository?

Page 49: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Questions to Ask Before You Share Data

● Are there privacy or intellectual property concerns?● Are your collaborators on board?● When should you share? ● Is your data well-organized and well-described?

Page 50: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

What Does Research Data Services Offer?

https://library.rice.edu/research-data-services• Workshops on R, Python, Excel, etc. (including 

upcoming 2 day workshops from Software Carpentry)• Consulting on finding, analyzing, managing, and 

visualizing data, including during office hours• Publishing and preserving data through the Rice 

Digital Scholarship Archive; providing DOIs• Reviewing data management plans

Page 51: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Evaluate This Workshop

● Please fill out the brief evaluation form athttp://library.rice.edu/requests/course-evaluation-form

Page 52: How to Share Your Data · 2020-02-20 · Open Data Commons Attribution License You are free: To Share: To copy, distribute and use the database. To Create: To produce works from the

Resources● Borer, Elizabeth T., et al “Some Simple Guidelines for Effective Data 

Management.” Bulletin of the Ecological Society of America (2009): 205–14.

● Dataverse, Data Management Plans, http://best-practices.dataverse.org/data-management/

● ICPSR Guide to Social Science Data Preparation and Archiving, http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/

● Svend Juul et al, “Take good care of your data,” http://www.epidata.dk/downloads/takecare.pdf

● UK Data Archive, Managing and Sharing Data: Best Practices for Researchers, http://www.data-archive.ac.uk/media/2894/managingsharing.pdf