iain hrynaszkiewicz - research integrity: integrity of the published record
DESCRIPTION
Iain Hrynaszkiewicz, Journal Publisher, BioMed CentralTRANSCRIPT
Open data and the integrity of the published record – an open access publisher’s perspective
JISC research integrity conference, 13th September 2011
Iain HrynaszkiewiczJournal Publisher, BioMed Central
About BioMed Central
• Launched in 2000 and now the largest global publisher of peer-reviewed open access journals
• Publisher of over 210 open access journals• >100,000 peer-reviewed open access articles
published • Part of Springer Science+Business Media• All research articles published under Creative
Commons attribution license• Article processing charges levied for accepted
research• Established institutional membership scheme
The publisher as a service provider
• Help maximise research impact and pace• Collect, organise and distribute knowledge• Preservation and (rapid) dissemination• Development of innovative content and tools• Collaboration with the scientific community
BioMed Central and open data
• Increasing transparency in scientific research and scholarly communication is at the core of strategy
• Data are an increasingly integral part of scholarly communication, with many opportunities for increasing the pace of knowledge discovery
• Publishers, particularly open access publishers, are well-placed to share information across domain boundaries
http://blogs.openaccesscentral.com/blogs/bmcblog/resourc e/opendatastatementdraft.pdf
http://www.biomedcentral.com/info/about/openaccess
“We believe the concept of open data, analogous to our policy on open access, goes beyond making data freely accessible. Data should also be free to distribute, copy, re-format, and integrate into new research, without legal impediments.”
Problem: Lack of credit/recognition for data sharing and publication
• In science credit is everything• Data sets are not generally as discoverable as journal
articles• Data sets are not – yet – generally as citable as journal
articles• Requirements for data sharing are field/location-specific• Some empirical evidence of the benefits still emerging
Solution #1: Innovative journals and article types enabling data
publication
Data notes
(papers)
Solution #2: Open Data Award
“We ... recognize researchers who have ... have demonstrated leadership in the sharing, standardization, publication, or re-use of biomedical research data.”
http://www.biomedcentral.com/researchawards/opendata
Problem: Where can data be stored – permanently?
• Publishers not best placed to run repositories for long term preservation of large datasets
• Mirrors of publisher content not able to accept arbitrary amounts of additional data
• How do you deal with the “data tsunami”?• Many data repositories exist but most are
domain/location specific and there are many different types of funding model, license agreement and persistent identifiers in use
Solution #1: Integrated (cloud-based) data repository and journal
http://www.gigasciencejournal.com
“GigaScience aims to revolutionize data dissemination, organization, understanding, and use. An online open-access open-data journal, we publish 'big-data' studies from the entire spectrum of life and biomedical sciences. To achieve our goals, the journal has a novel publication format: one that links standard manuscript publication with an extensive database that hosts all associated data and provides data analysis tools and cloud-computing resources.”
Solution #2: Comprehensive author information on available data
repositories
http://datacite.org/repolist
http://www.biomedcentral.com/info/about/supportingdata
Problem: Data are not consistently linked to publications
• Data deposition policies are not established in all fields• Even where they are links/accession numbers tend to be
inconsistently presented• Researchers may, independently of journal
requirements, deposit data in repositories• A missed opportunity to enhance the literature
Solution #1: ‘Availability of supporting data’ article section
• A tool for editors, authors and scientific communities to, at the appropriate time, put data deposition policies into practice
• Provides links in a consistent place within an article to supporting data - regardless of the location or format of the data - and to make it clear to readers when they can also access the data as well as the article
• Data must be permanently available (DOI or equivalent)• Journals include GigaScience, BMC Research Noteshttp://www.biomedcentral.com/info/about/supportingdatahttp://blogs.openaccesscentral.com/blogs/bmcblog/entry/availability_of_supporting_data_crediting
Solution #2: Submission integration with the Dryad repository
Problem: Ambiguous and suboptimal licensing that restricts
data (re)use
“The data should be released in standardized formats without intellectual property constraints.” Conway PH, VanLare JM: Improving Access to Health Care Data: The Open Government Strategy. JAMA 2010;304(9):1007-1008.
http://pantonprinciples.org/
http://www.isitopendata.org/
“[P]eople mis-use copyright licenses on uncopyrightable materials and data sets: the confusion of the legal right of attribution in copyright with the academic and professional norm of citation of one's efforts.” John Wilbanks, VP, Science, Creative Commons, http://bit.ly/djl5Fa August 11, 2010
Solution: Stakeholder engagement and community collaboration,
leadership
http://bit.ly/n4U348 http://bit.ly/h6PVoO
Problem: Lack of practical guidance and exemplars, to help overcome
barriers• Online publishing makes data sharing possible, but
sharing/publishing detailed human subjects data, in the absence of explicit consent, can potentially infringe privacy (ethically and legally)
• Data are more (re)usable if published in community endorsed, standard formats
• Standards and appropriate guidance do not yet exist in all domains
Solution #1: Work with journal editors to produce guidance where it
is needed
BMJ 2010;340:c181Co-published in:Trials 2010, 11:9
Solution #2: Publish exemplars
Solution #2: Publish exemplars
Solution #3: Incentivize, promote and share best practice and
standardshttp://www.biomedcentral.com/bmcresnotes/series/datasharing
http://biosharing.org/standards_view
Conclusions• Rather than ‘why share data?’, the questions
are ‘what’, how’, ‘where’, and ‘when’?• The future of scholarly communication
depends on a commitment to data as well as papers
• Supporting and investing in open data is a service to the scientific community
• We can better serve funders and beneficiaries of scientific with transparency