carpenter: what constitutes peer review of data study

34
WHAT CONSTITUTES PEER REVIEW OF RESEARCH DATA – A STUDY OF CURRENT PRACTICE TODD A. CARPENTER EXECUTIVE DIRECTOR, NISO CO-CHAIR, RDA DATA PUBLISHING IG

Upload: national-information-standards-organization-niso

Post on 11-Apr-2017

368 views

Category:

Technology


0 download

TRANSCRIPT

WHAT CONSTITUTES PEER REVIEW OF RESEARCH DATA – A STUDY OF CURRENT PRACTICETODD A. CARPENTEREXECUTIVE DIRECTOR, NISOCO-CHAIR, RDA DATA PUBLISHING IG

IT DOESN’T MAKE SENSE WITHOUT THE METADATA!* XKCD (CONVINCING)* With apologies for the sexism to my many math-minded female friends!

WE ALL KNOW WHAT PEER REVIEW OF A PAPER IS

Initial editorial review includes things like:• Does the article fit within the journal’s scope and aims?

• Will it be of interest to the readership?

• Does it meet minimum quality standards for writing and content?

• Did the author follow the instructions?

• Did it meet some type of deadline?

Journal editors reject between 6 and 60% of papers at this stage. (Per Schultz)

AND WHAT PEER REVIEWERS ASSESS IN PAPERS

• Study design and methodology

• Soundness of work and results

• Presentation and clarity of data

• Interpretation of results

• Whether research objectives have been met

• Whether a study is incomplete or too preliminary

• Novelty and significance

• Ethical issues

• Other journal-specific issues

BUT PEER REVIEW ISN’T PERFECT

• Causes delay in publication

• Bias against specific categories of papers/topics

• Social & cognitive biases

• Unreliability

• Inability to detect errors and fraud

• Lack of transparency—leading to unethical practices

• Lack of recognition for reviewers

(from Walker and Rocha da Silva)

PEER REVIEW OF DATA IS EVEN LESS PERFECT

• Overall complexity of data (Data, metadata, methodology, IP)

• Not only need data, but software to process data as well

• Provenance of data

• Post gathering, preprocessing of data

• Statistical/computational rigor

• Size and complexity of data sets

• Also need to review code/software for proper analysis

• How to handle different versions of the same data set?

• When doesa new version require additional review? Does the earlier review still apply?

• More work on already burdened reviewers

JUST LIKE ALL OTHER SCHOLARLY CONTENT

“It was generally agreed that

data should be peer reviewed.” - 2014 Study of 4,000 researchers by David Nicholas et al.

HECK, WHO PUBLISHES THEIR DATA ANYWAY?

A 2011 study of 500 papers published in 2009 from 50 top-ranked research journals showed that only only 47 papers (9%) of those reviewed had deposited full primary raw data online.

(Source: Alsheikh-Ali et al)

OH, HOW DATA PUBLISHING HAS CHANGED

Over the past decade the number of journals that accept data has increased.

The number of titles that explicitly require data sharing in some formis also increasing rapidly.

A culture of data sharing is developing and researchers are responding to data sharing requirements, the efficacy of data sharing, and its growing acceptance as a scientific norm in many fields.

WHOLE LOT OF DATA PUBLISHING GOING ON

Source: Assante et al.

LOTS OF JOURNALS PUBLISH DATA TODAY

Data policies are growing in number• PLOS

• AGU

• SpringerNature

• American Economic Association

• COPDESS Statement of Commitment (43 Signatories)

• The International Committee of Medical Journal Editors has recommended that as a “condition of … publication” of a trial report, journals will “require authors to share with others the deidentified individual-patient data no later than 6 months after publication”.

• Resulting from inclusion of data release as part of OSTP Memo, Wellcome Trust Policy, Gates Foundation policy and many others.

BUT WHO IS RESPONSIBLE FOR DATA PEER REVIEW?

• Repositories that accept data? • Some repositories, such as DRYAD & NASA Planetary Data System Qualitative

Data Repository, perform data review upon deposit.

• Journals that require data submission as part of publication process?• Some journals that require data release with publication require at least cursory

peer review of associated data with their papers.

• Data journal publishers?• Explicitly providing peer review and “publication” of data

WHAT IS A DATA PAPER?

• Data papers are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices” (Chavan & Penev, 2011)

• Phrased more succinctly: “Information on the what, where, why, how and who of the data” (Callaghan et al., 2012)

WHO PUBLISHES THEM?

• In 2014 paper, Candela et al. identified 116 journals that publish data papers. 7 “pure”, i.e., just data publications, and 109 that published at least one data paper, of which Springer/BioMedCentral contributed 94 titles

• Subsequently, additional titles have launched and more detailed data access policies are being advanced.

• SpringerNature/BioMedCentral now have a set of four data policies that cover all of their titles. Only the 8 level 4 titles, where data sharing is required were included in this study Num of titles Type

8 4343 3178 2217 1

STUDY METHODOLOGY

• Taking Candela et al. list of titles, I added newly released data journals to the list (13 new titles), as well as SpringerNature’s Data Policy types. In total, 39 specific titles were reviewed.

• Found the available peer review criteria or instructions for reviewers or editorial policies.

• Extracted key criteria from each to compile then compare similarities & differences.

SO WHAT IS PEER REVIEW OF A DATA SET?

PEER REVIEW OF DATAIS SIMILAR TO PEER REVIEW OF A PAPER, BUT INCLUDES

A LOT MORE

RECOMMENDED PEER REVIEW CRITERIA FROM LAWRENCE ET AL.

Suitability for PublicationImportanceOriginality/Novelty - of scienceMetadata QualityMetadata PresentationMetadata Standards Conformance / Template ConformanceDataset DOI Assignment Metadata Rights Information ProvenanceMetadata Completeness

Data Logic & ConsistencyData Format ConsistencyData - PlausibilityData - worthy of sub selection/broad enough/Appropriate selectionData - Software used to process describedData Quality MethodsData Anomalies Documented

FIVE BROAD CATEGORIES OF PEER REVIEW PROCESSES FOR DATA

Editorial Review (9 Criteria)

Metadata Quality (9 Criteria)

Data Quality (15 Criteria)

Methodology (12 Criteria)

Other – (11 Criteria)

EDITORIAL REVIEW OF THE DATASET

Editorial Review 36Open Peer Review 3Conflict of Interest Policy 17Topical Appropriateness in Title 29Suitability for Publication in Title 28Importance of the Subject 12Overall Quality of Research 31Unpublished 13Originality/Novelty - of Science 27

METADATA REVIEW OF THE DATASET

Metadata Quality 33Metadata Presentation 24Metadata Standards Conformance / Template Conformance 19

Title/Abstract/Writing Clarity 25Keyword Selection 2

Dataset DOI Assignment 10

Metadata Rights Information 1Provenance 0Metadata Completeness 19

DATA REVIEW OF THE DATASETData Logic & Consistency 17Data Format Consistency 17Data - Non-Proprietary Formats 6Data - Plausibility 1Data - of High Quality 2

Data - Worthy of sub selection/broad enough/Appropriate selection 16Data reuse 23Data - Software Used to Process Described 15Data Units of Measure 22Data Quality Methods 17Data Anonymization 5Data Anomalies Documented 2How are outliers identified & treated? 1Any data errors introduced by transmissions (checksums)? 3

Any errors in technique, fact, calculation, interpretation, completeness? 18

DATA REVIEW OF THE DATASET

Methodology - Appropriate 25Methodology - Current 9Methodology - Data Collection Methods 23Methodology - High Technical Standard 18Methodology - Equipment Description 2Methodology - Replicable 23Methodology - Quality Control 3Study design - Overall 16Methodology - Supporting Experiments 16Study design - Replications Identified 15Study design - Independencies or Co-Dependencies 16Study/Data Preprocessing Described 16

OTHER REVIEW CRITERIA

CitationstoOtherRelevantMaterials 29Fairness 1AnonymityofReviewers(ifdesired) 15EthicsofExperimentation 28PublicDataSharing,OpenLicenseRequirement 32DataRepositorywithSustainabilityModel 13DataSharing- PlatformAgnostic 1LinktoPublicRepository 31DescriptionsofHowtoAccessData 30AbbreviationsNoted/Defined 15AllContributors/AuthorsCredited 16

MOST-REFERENCED PEER REVIEW CRITERIA

AttributeIncluded

StatementLawrence

et alEditorial Review 36Metadata Quality 33 xPublic Data Sharing, Open License Requirement 33Overall Quality 31Link to Public Repository 31Descriptions of How to Access Data 30Topical Appropriateness 29Citations to Other Relevant Materials 29Suitability for Publication 28 xEthics of Experimentation 28Originality/Novelty - Of Science 27 xTitle/Abstract/Writing Clarity 25Methodology - Appropriate 25 xMetadata Presentation 24 xData Reuse 23

LEAST-REFERENCED PEER REVIEW CRITERIA

AttributeIncluded

StatementLawrence

et al

Anydataerrorsintroducedbytransmissions(checksums)? 3 xMethodology- QualityControl 3KeywordSelection 2Data- OfHighQuality 2DataAnomaliesDocumented 2 xMethodology- EquipmentDescription 2MetadataRightsInformation 1 xData- Plausibility 1 xHowareoutliersidentified&treated? 1 xFairness 1Datasharing- PlatformAgnostic 1Provenance 0 x

WHAT CONSTITUTES ROBUST PEER REVIEW

• Editorial purpose and cohesion is important, even in data publication

• Significant push for openness of data & reuse, with recognition of anonymization or access control if appropriate

• Links to public repositories and details on gaining access

• Ethical concerns regarding data collection need to be documented

• Overall quality of metadata, with specifics depending on domain

• Focus on replicability

COMPARISON TO LAWRENCE ET AL LIST WITH CURRENT PRACTICE

• Of the 23 criteria outlined by Lawrence, et al, only 5 were in the top third of criteria being applied by journal publishers

• 10 of the 23 were in the middle third of criteria by number of titles applying those criteria

• And 8 of their 23 criteria are in use by less than 1/3 of reviewed policies.

Shows how much the process of data publishing has changed in the past 6 years.

SOME CAVEATS

WHAT IS “DATASET VALIDATION” IN REPOSITORIES?

Many journals request (or require) deposit of data in a trusted repository.

Few repositories conduct Dataset Validation upon deposit.

• “At the moment among the most debated and undefined data publishing phases”

• “There are no shared established criteria on how to conduct such review and on what data quality is.”

(Source: Assante et al.)

THE ENTIRE PEER REVIEW PROCESS ISN’T KNOWABLE

• Detailed peer review instructions are not always publicly available and additional detail/instructions may be provided to peer reviewers.

• One journal’s catch-all instruction was “ALL Aspects of the data set are reviewed”

• This is a study of policies, not of actual peer review behavior. (i.e., some peer reviewers might not follow the exact letter of the policy[!])

CONCLUSION• Over the past five years, there has been a significant

growth in the publication and sharing of scholarly research data.

• With this rise, there has been an increase in the demand for peer review of data sets.

• Each domain is different, and the expectations of data review are distinct, as they are with article peer review processes, but there are commonalities and baseline standards can be propagated.

REFERENCES• Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JP. Public availability of published research data in high-impact journals. PLoS One. 2011;6(9):e24357.

doi: 10.1371/journal.pone.0024357. Epub 2011 Sep 7.

• Assante, M. et al., (2016). Are Scientific Data Repositories Coping with Research Data Publishing?. Data Science Journal. 15, p.6. DOI: http://doi.org/10.5334/dsj-2016-006

• Candela, Leonardo, et al. "Data journals: A survey." Journal of the Association for Information Science and Technology 66.9 (2015): 1747-1762.

• Chavan, V., & Penev, L. (2011). The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinformatics, 12 (15).

doi:10.1186/1471-2105-12-S15-S2

• Lawrence, B., Jones, C., Matthews, B., Pepler, S., & Callaghan, S. (2011). Citation and peer review of data: Moving towards formal data publication. International Journal of Digital Curation , 6 (2), 4-37. http:// doi:10.2218/ijdc.v6i2.205

• Nicholas, David, et al. "Peer review: still king in the digital age." Learned Publishing 28.1 (2015): 15-21. Also CICS/CIBER. Trust and Authority in Scholarly

Communications in the Digital Era (2013). Available at: http://ciber-research.eu/download/20140115-Trust_Final_Report.pdf

• Schultz DM (2010). Rejection rates for journals publishing in the atmospheric sciences. Bulletin of the American Meteorological Society, 91(2): 231-243. doi: 10.1175/2009BAMS2908.1.]

• SpringerNature Research Data Policy Types, http://www.springernature.com/gp/group/data-policy/policy-types

• Taichman DB, Backus J, Baethge C, Bauchner H, de Leeuw PW, Drazen JM, et al. Sharing clinical trial data – A proposal from the International Committee of Medical Journal Editors. N Engl J Med. 2016;374(4):384–6.

• Walker, Richard and Rocha da Silva, Pascal. Emerging trends in peer review—a survey. Front. Neurosci., 27 May 2015.

https://doi.org/10.3389/fnins.2015.00169

THANK YOU!TODD A. CARPENTEREXECUTIVE DIRECTOR, [email protected]@TAC_NISO