how to share your data · 2020-02-20 · open data commons attribution license you are free: to...
TRANSCRIPT
How to Share Your Data
Lisa SpiroFebruary 2020
Data Sharing Sped Human Genome ProjectBermuda Principles (1996): “all human genomic sequence information, generated by centres funded for large-scale human sequencing, should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society.”
https://www.genome.gov/about-nhgri/Policies-Guidance/Genomic-Data-Sharing/data-submission
I. Why Share Data?
Overview of Data Sharing
● Benefits researchers● Benefits research fields
● Benefits society
https://www.ands.org.au/working-with-data/articulating-the-value-of-open-data/open-data
To Conform to Journal Requirements
● PLOS: “require authors to make all data necessary to replicate their study’s findings publicly available without restriction at the time of publication”
● Science: data must be available ● SAGE: encourages, expects, or requires data sharing● SpringerNature: encourages, expects, or requires● Wiley: encourages, expects, or mandates● American Geophysical Union: make data available whenever possible
To Meet Funding Agency Requirements
Most federal agencies require data management plans that include a section on data access, such as:● NIH● NSF● NOAA● DOE
To Increase Your Own Citations“a) authors who share data may be rewarded eventually with additional scholarly citations, and b) data-posting policies alone do not increase the impact of articles published in a journal unless those policies are enforced.” (Christensen et al, 2019)
Articles with data availability statements and URLs for data “can have up to 25.36% higher citation impact on average” (Colavizza et al, 2019 [preprint])
To Benefit Research Communities
● Support future research studies-Enables new insights to be generated from existing datasets -Supports larger scale studies (e.g. astronomy, ecology)-Lowers costs-Speeds research-Sparks new collaborations
● Provide resources to be used in teaching
To Facilitate Reproducibility
● “Reproducibility crisis” (especially in psychology) raises serious questions about the credibility of research
● Sharing data allows others-researchers to present evidence for their results-others to use data and methods
To Facilitate Data Quality
● Data sharing policies may discourage use of fraudulent or flawed data.
● “With raw data we have the possibility to find and correct mistakes. On top of that, the probability of making a mistake is likely to be lower once you have gone to the effort of archiving your data in such a way that another person can understand it.” (Nuijten)
To Preserve to Data
● Offload responsibilities for providing access to and preserving data to professionals
● Future You will be grateful…
Illustration by Jørgen Stamp digitalbevaring.dk CC BY 2.5 Denmark
Activity: Data Sharing Policies
Look up the data sharing policy for a funding agency or journal/ publisher in your discipline.
What are the requirements for data sharing (if any)?
What justification (if any) do they use for the policy?
II. Concerns about Sharing Data
The State of Open Data Report 2019
Concerns About Misuse of Data
● Licenses can be attached to data.● Scientific norms such as citation still govern data usage.
● Data can be shared through a repository such as ICPSR that places restrictions on sensitive data.
Fear of Getting Scooped
● If your data is used, it will be cited.
● You can embargo data, delaying its release.
https://rd-alliance.org/plenary-meetings/fourth-plenary/plenary-cartoons.html
Not Receiving Appropriate Acknowledgement● Make sure your dataset has a DOI (encouraged by NSF)
● Make it easy to cite using data citation standards, such as DataCite:
Creator (PublicationYear). Title. Publisher. Identifier
Example: Barclay, Janet Rice (2013) Stream Discharge from Harford, NY. Cornell University Library eCommons Repository. http://hdl.handle.net/1813/34425
Costs of Sharing Data
● Data costs can be written into most grants.NSF: data deposit fees for repositories “are allowable expenses in proposal and award budgets.”
● Many repositories are free (although there is certainly time involved in sharing data).
Lack of Time to Deposit Data
● Organizing and describing data does take time.● But practicing good data management from the beginning can save you time in the long run.
https://www.data.cam.ac.uk/intro-data-champions/data-champions-cartoons
Risk of Sharing Sensitive Information
● This is a legitimate concern. You should make sure that you aren’t sharing private or proprietary information; most data policies make exceptions for sharing this kind of data.
● But…You can protect confidentiality by removing or obscuring direct, indirect, and geographic identifiers.You should include data sharing in your IRB documents.You can use repositories like ICPSR for restricted use data.
Discussion: Pros and Cons of Data Sharing If a colleague asked you about whether to share their research data, what would you tell them?
III. Where Can You Share Your Data?
Selecting a Research Data Repository● Research data repository: “a subtype of a sustainable information infrastructure which provides long-term storage and access to research data that is the basis for a scholarly publication. Research data means information objects generated by scholarly projects...” (Re3data)
● Why use a data repository rather than providing data on request:- Easier to discover- Easier to access- Built in support for preservation- Relieves researchers from long-term management of data
Finding Appropriate Repositories
● Consult lists of recommended repositoriesSpringerNaturePLOS Recommended Repositories
● Use directories like https://www.re3data.org/
Disciplinary Data Repositories
● Contain data from researchers in a particular research domain
● Examples: GenBank (genetics) or PANGEA (earth systems)
Advantages Disadvantages
More recognized by peers May limit what they accept
Metadata and data formats better for discipline
May be less visible to researchers in other areas
General Data Repositories
● Not restricted by discipline ● Examples: Zenodo, FigShare or Harvard DataverseAdvantages Disadvantages
Recognized across disciplines; indexed by search engines
May be more difficult for researchers to find your data
Robust platforms for data publishing; good features
May lack specific metadata and data support for your discipline
Widely adopted May charge a fee or be tied to for-profit
Institutional Repository
● General repository associated with research institution
● Example: Rice Digital Scholarship ArchiveAdvantages Disadvantages
Associated with Rice May not carry same weight w/ peers
Assistance from library in depositing data
May lack metadata and data support for your discipline
Assurances of long-term support
User interface currently not so great
IV. What criteria should you consider in selecting a data repository?
Make Your Research Data FAIR
https://library.cuni.cz/services/openaccess/open-research-data/
Selecting a Data Repository
https://blogs.lse.ac.uk/impactofsocialsciences/2013/11/29/how-to-find-an-appropriate-research-data-repository/
Questions to Ask in Evaluating a Data Repository
● How well will the data be preserved? How stable is the repository?
● What kind of reputation does the archive have in the community?
● Does the repository facilitate citation of the data? Does it offer DOIs?
More Questions to Ask about Repositories● Does the repository allow you to describe the
data fully and make it discoverable?● What features does it offer? Connections with
GitHub? Download stats? Landing page for dataset?
● Are there curators who can help to deposit the data?
● What are the costs of deposit, if any?
Get Help Selecting a Repository
● Consult with the Research Data Services team● See Assessing General Data Repositories
Activity
● Find an appropriate repository for your data.● Assess the repository using the criteria we’ve discussed.
● Would you use this repository?
V. How to Prepare Your Data for Sharing
Compile Data to Be SharedShare data needed to replicate the study
Don’t share identifiable human subjects data, proprietary data, etc.
https://datadryad.org/docs/QuickstartGuideToDataSharing.pdf
Ensure That Files Are Well Organized and In Open Formats● Open formats (csv, txt, tiff, etc) facilitate long term usage
● Good organization enables future users (including you) to understand data --Meaningful filenames--Clear grouping of files
Document and Share Your DataTypical sections of a readme include:• Title• Author• Date• Citation of related study• Description• Restrictions• Methods• Codes & variables• File description
Example of Data Deposit
Selecting a License
● Since facts can’t be copyrighted, data can’t be, either.
● However, a database does have some intellectual property protections, such as how the data is organized, what data is included, etc.
● In any case, it’s best to explicit about reuse rights.
● More open licenses tend to be preferred because they facilitate reuse.
Creative Commons Zero
● Releases data into the public domain, removing barriers to use.
● Advantages:Human readableInteroperableUniversal
● Used by Dryad, Dataverse (default license), et al
Open Data Commons Attribution License● You are free:
To Share: To copy, distribute and use the database.To Create: To produce works from the database.To Adapt: To modify, transform and build upon the database.
● As long as you:Attribute: You must attribute any public use of the database, or works produced from the database, in the manner specified in the license.>>If you want to make sure that dataset is cited.
Open Data Commons Open Database License● You are free:
To Share: To copy, distribute and use the database.To Create: To produce works from the database.To Adapt: To modify, transform and build upon the database.
● As long as you:AttributeShare-Alike: ”…offer that adapted database under the ODbL.”Keep open>>If you want to guarantee that derivative datasets also remain open
Case Study: Sharing Data with Zenodo
Activity
● Describe the process of sharing data through a particular repository.
● Is the process straightforward? ● How would you prepare to share your data using this repository?
Questions to Ask Before You Share Data
● Are there privacy or intellectual property concerns?● Are your collaborators on board?● When should you share? ● Is your data well-organized and well-described?
What Does Research Data Services Offer?
https://library.rice.edu/research-data-services• Workshops on R, Python, Excel, etc. (including
upcoming 2 day workshops from Software Carpentry)• Consulting on finding, analyzing, managing, and
visualizing data, including during office hours• Publishing and preserving data through the Rice
Digital Scholarship Archive; providing DOIs• Reviewing data management plans
Evaluate This Workshop
● Please fill out the brief evaluation form athttp://library.rice.edu/requests/course-evaluation-form
Resources● Borer, Elizabeth T., et al “Some Simple Guidelines for Effective Data
Management.” Bulletin of the Ecological Society of America (2009): 205–14.
● Dataverse, Data Management Plans, http://best-practices.dataverse.org/data-management/
● ICPSR Guide to Social Science Data Preparation and Archiving, http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
● Svend Juul et al, “Take good care of your data,” http://www.epidata.dk/downloads/takecare.pdf
● UK Data Archive, Managing and Sharing Data: Best Practices for Researchers, http://www.data-archive.ac.uk/media/2894/managingsharing.pdf