tim osborn: research integrity: integrity of the published record

27
20/06/22 Wellcome Collection Conference Centre, 13 September 2011 slide 1 Research Integrity Conference The importance of good data management

Upload: jisc

Post on 09-May-2015

719 views

Category:

Technology


0 download

DESCRIPTION

Tim Osborn, Reader, University of East Anglia

TRANSCRIPT

Page 1: Tim Osborn: Research Integrity: Integrity of the published record

11/04/23 Wellcome Collection Conference Centre, 13 September 2011 slide 1

Research Integrity ConferenceThe importance of good data management

Page 2: Tim Osborn: Research Integrity: Integrity of the published record

Climate research data and research integrity

Dr Tim Osborn

Climatic Research Unit

School of Environmental Sciences

University of East Anglia

JISC Research Integrity Conference:the Importance of Good Data Management

13 September 2011

Page 3: Tim Osborn: Research Integrity: Integrity of the published record

Integrity of the published research record

Why is it important for climate research and why now?(Of course it’s always been important and not just for this discipline)

The global warming issue:Scientifically challengingPolitically, socially and economically contentiousHigh stakes (economic and non-economic)Under intense scrutiny

Page 4: Tim Osborn: Research Integrity: Integrity of the published record

Climate change hacked emails controversy

The integrity of our research was severely questionedWhat role did research data issues (management, sharing, etc.) play in this?

Need to distinguish research integrity from perceptions of research integrity

These issues probably played a rather small roleOur research data and the research record were preservedWe “created” very little raw data and we have an excellent record in preserving and publishing for re-use our derived data

Instead, the perception of doubt arose very much more from the contents of the hacked emails and their

interpretation

Page 5: Tim Osborn: Research Integrity: Integrity of the published record

Climate change hacked emails controversy

Improved research data management and sharing would have made little difference to the attacks on our integrity

Not to our critics, perhaps a small role in the cross-over to the main-stream media

Nevertheless, there are areas where we can improve and we received some criticism in these areasThe climate science community as a whole should improve

Data sharing for openness, for re-useImproved data management for preserving workflows and linking

articles to analysis to data (e.g. JISC ACRID)

Page 6: Tim Osborn: Research Integrity: Integrity of the published record

Managing and sharing research data:why should we improve?

Supports reproducibility (necessary) and repeatability (desirable)Maintains (actual and perceived) integrity of researchEssential because high-stake decisions must be informed by sound scientific assessment

Supports further exploration of scientific findingsScientific findings that are not clear cut (e.g. in the vicinity of the statistical significance) are more sensitive to variations in data, methodological choices, assumptions, etc.

Supports data re-use for other studiesWe are data poor (despite > 10,000 TB) relative to the complexity of the climate system

Page 7: Tim Osborn: Research Integrity: Integrity of the published record

Estimated numbers of climate change articles:Total > 100,000Just 2009 > 13,000which is > 1 / hour

Grieneisen & Zhang (2011) doi: 10.1038/nclimate1093

Sharing climate data: some challenges

Page 8: Tim Osborn: Research Integrity: Integrity of the published record

Data volume is already large (> 10,000 TB)Projected to grow tenfold by end of this decade

Overpeck et al. (2011) doi: 10.1126/science.1197869

Sharing climate data: some challenges

Page 9: Tim Osborn: Research Integrity: Integrity of the published record

Sharing climate data: some limitations

Data with non-disclosure agreements Formal or informal agreements

Holding back for future exploitation Controlling use, getting recognition

Time and resources Costs may be obvious, benefits may be unrealised Standards, meta-data and software increase the value in re-

use, but can increase the time needed

Page 10: Tim Osborn: Research Integrity: Integrity of the published record

Non-disclosure agreements: real or excuse?

Example 1: UK climate data Data sets must not be passed on to third parties under any

circumstances... Once the project work using the data has been completed, copies of the datasets held by the end user should be deleted... The introduction of sanctions against individuals or Departments may be considered if breaches occur.http://badc.nerc.ac.uk/conditions/ukmo_agreement.html

Page 11: Tim Osborn: Research Integrity: Integrity of the published record

Non-disclosure agreements: real or excuse?

Example 2: Global precipitation data One of the most widely used analyses of variations in

precipitation across the global land surface is “based on the complete GPCC monthly rainfall station data-base (the largest monthly precipitation station database of the world with data from ca. 85,000 different stations)... Corresponding to international agreement, station data provided by Third Parties are protected.”http://gpcc.dwd.de

Page 12: Tim Osborn: Research Integrity: Integrity of the published record

Non-disclosure agreements: real or excuse?

Informal agreements exist too Especially with newly collected data provided in advance of its

formal publication These agreements with colleagues, and the consequences of

breaching them, are genuine (regardless of what the ICO might decide if tested under FOI/EIR legislation!)

Page 13: Tim Osborn: Research Integrity: Integrity of the published record

Holding back data for future exploitation

Traditionally, climate data itself aren’t publishedInstead, a journal article is published reporting findings

arising from some analysis of the data Provides a citable outcome for which the scientist gains credit

This could take many months to a few years Because publishable findings may only arise from extensive

analysis of the data or from a collection of multiple records and it has to go through peer-review system

In the meantime, the data may have been shared and used under non-disclosure restrictions

Page 14: Tim Osborn: Research Integrity: Integrity of the published record

Ways forward…1

Providing data (and other materials) with a publication to allow it to be reproduced (or perhaps repeated)

E.g. supplementary online materialsSeen as a burden for all 13,000 climate change articles per year

Co-benefits must be evident to make this worthwhileCitation and data re-use

Potential proliferation of copies of identical (or perhaps not!) copies of datasets

Better to provide a unique identifier to existing data that have been used, rather than a copy of the data

Page 15: Tim Osborn: Research Integrity: Integrity of the published record

Ways forward…2

Data publicationNewly collected (observed, simulated, derived) datasets published in their own right, not as part of scientific paperMeta-data and other accompanying information

But could speed up the lag from data collection to data publication, and much lighter-touch peer review

Citable (e.g. DOI) allows due creditIdentifiable (long-lasting URI) allows unique identification

Should be unique – updates or modifications to the data should have separate unique identifier (how to link between versions –

considered in our JISC ACRID project)

Page 16: Tim Osborn: Research Integrity: Integrity of the published record

Preferred data archives…1

Storing data with publisher, linked directly to articleUseful (not essential) for a strong link between article and dataNot ideal for long term preservation, large datasets, tools for exploring data, searches of databases etc.Not ideal for re-use

University archiving possible, but similar disadvantagesDiscipline-specific, dedicated data centres are preferable

E.g. World Data Center system (http://www.icsu-wds.org/)WDC-Climate, WDC-Paleoclimate, BADC, BODC, ITRDB, CMIP5

Page 17: Tim Osborn: Research Integrity: Integrity of the published record

Preferred data archives…2

Sub-discipline specific archives superior to broader archives

More generalised approaches provide a steeper barrier for submission (e.g. describing all environmental data sets via one standard meta-data model – very large model, much to learn etc.)Approaches tailored to sub-disciplines avoid irrelevant structures, formats, meta-dataSometimes expertise is needed rather than extra meta-data

Page 18: Tim Osborn: Research Integrity: Integrity of the published record

Summary points Improved data sharing and links to published findings are needed across the climate science community, to increase the pace of knowledge creation and to support the integrity of published work New approaches to publishing newly constructed datasets should be encouraged and adopted where possible

Bringing benefits of citations, credit and unique identification Published articles should identify data used, preferably via citation/identification of already published data rather than providing a further copy of the data Subject-specific data archives are preferred, offering better support for data re-use Other issues (non-disclosure agreements, time and resources) need to be considered – benefits must be clear to encourage them to be overcome

Page 19: Tim Osborn: Research Integrity: Integrity of the published record
Page 20: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is highCost of reducing GHGs high, adverse impact of not doing so is low

Decision making in the actual context is much harder:Significantly reducing GHGs may prove difficult with moderate to high costsNet effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact

Page 21: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high

Page 22: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high

Page 23: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is high

Page 24: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Easy contexts for decision making:Cost of reducing GHGs low, adverse impact of not doing so is highCost of reducing GHGs high, adverse impact of not doing so is low

Page 25: Tim Osborn: Research Integrity: Integrity of the published record

Global warming issue: high stakes

Decision making in the actual context is much harder:Significantly reducing GHGs may prove difficult with moderate to high costsNet effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact

Page 26: Tim Osborn: Research Integrity: Integrity of the published record

Time and resources

Must not mistake reluctance to commit time and resources with desire to avoid disclosureThere is a real cost involved

Standards, meta-data and software increase the value in re-use, but can increase the time needed

The answer is not simply to obtain fundingEven with specific funding, unless the benefits of sharing data, meta-data are clear there will be pressure to do things with more obvious benefits

Page 27: Tim Osborn: Research Integrity: Integrity of the published record

11/04/23 Wellcome Collection Conference Centre, 13 September 2011 slide 27

Research Integrity ConferenceThe importance of good data management