force11: enabling transparency and efficiency in the research landscape

34
Melissa Haendel, PhD Oregon Health & Science University Future of Research Communications and E-Scholarship Enabling transparency and efficiency in the research landscape @force11rescomm @ontowonka

Upload: mhaendel

Post on 17-Jul-2015

209 views

Category:

Science


1 download

TRANSCRIPT

Melissa Haendel, PhDOregon Health & Science University

Future of Research Communications and E-Scholarship

Enabling transparency and efficiency

in the research landscape

@force11rescomm@ontowonka

Do an

experimentPublish your

results

Research pre-Web:

Document in a

lab notebook

The Research Life Cycle

TECHNIQUE

COLLABORATION

PUBLICATIONDATASET

GRANT

Impetus for change: Is our

current method serving science?47/50 major preclinical

published cancer studies could not be replicated

“The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.”

Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531

Not all content is available for

synthesis and discovery

Search PubMed: Spinal Muscular Atrophy

The scientific corpus is

fragmented

~25 million articles total, each covering a fragment of the biomedical space

Each publisher owns a fragment of a particular field

The current process is inefficient and slow

Wiley

Elsevier

MacMillian

Oxford

Spinal Muscular Atrophy

Committee on Academic

Promotions

What Counts

Money

Grants

Papers

Teaching

Service

What Does Not

Sharing data

Sharing software

Open access

Collaboration

Patents

Startups

Getting Ahead as a Computational Biologist in Academia PLOS Comp Bioldoi:10.1371/journal.pcbi.1002001

Beyond the PDF Conference/unconference

where all stakeholders come together as equals to discuss issues– Publishers

– Technologists

– Scholars

– Library scientists

– Humanists

– Policy makers

– Funders

Incubator for change

What would you do to change scholarly communication?

San Diego, Jan 2011 ...... Amsterdam, March 2013........Oxford, 2015

http://www.force11.org/beyondthepdf2

FORCE11

Future of Research Communications and E-Scholarship: A grass roots effort to accelerate the pace and nature of scholarly communications and e-scholarship through technology, education and community

Why 11? We were born in 2011 in Dagstuhl, Germany

Principles laid out in the FORCE11 Manifesto

FORCE11 launched in July 2012

www.force11.org @

Promote community, cross-

fertilization and interoperability

FORCE11 helps facilitate communications across disciplines and communities

Issues are not identical but we can learn from each other

Community platform– Meetings

– Discussions

– Tools and resources

– Blogs

– Event calendar

– Community projects

Working groups– Data Citation

– Resource identification initiative

– Attribution

– Data standards/Biosharing

Data Citation Working Group

FORCE11 provides a neutral space for bringing groups together 35 individuals

representing > 20 organizations concerned with data citation

Conducted a review of current data citation recommendations from 4 different organizations

Arrived at consensus principles

http://www.force11.org/datacitation

Data Citation Principles

Consensus Data Citation principles ready for comment

Designed to be high level and easy to understand

1. Importance2. Credit and

Attribution3. Evidence4. Unique

identifiers5. Access6. Persistence7. Versioning8. Interoperability

and flexibility

Data Citation Implementation

https://www.force11.org/datacitationimplementation

https://peerj.com/preprints/697/

BioCADDIE Data Discovery Index

https://www.force11.org/group/biocaddie/cewg

Challenge: Working with Web Data

Often have inadequate descriptions so we don’t know what they are about or how they were constructed

Datasets change over time, but often don’t come with versioning information

May have been constructed using other data, but it’s not clear which version of data was used or whether these were modified

Data may be available in a variety of formats

There may be multiple copies of data from different providers, but it’s unclear if they are exact copies or derivatives

Version of standard or vocabulary used not indicated

Data registries are not synchronized and can contain conflicting information

W3C HCLS Dataset Description

Develop a guidance note for reusing existing vocabularies to describe datasets with RDF– Mandatory, recommended, optional descriptors– Identifiers– Versioning– Attribution– Provenance– Content summarization

Recommend vocabulary-linked attributes and value sets

Provide reference editor and validation

Metadata Model:

description – version – distribution

http://tiny.cc/hcls-datadesc

On another planet the FORCE was

strong…..

Journal guidelines for methods are often poor and

space is limited

“All companies from which materials were obtained should

be listed.” - A well-known journal

Reproducibility is dependent at a minimum, on

using the same resources. But…

How identifiable are resources in the

published literature?

Only ~50% of resources were identifiableVasilevsky et al, 2013, PeerJ

There is no correlation between impact factor and

resource identification

Journal Impact Factor

0 10 20 30 40

Fra

ction o

f re

sourc

es identified

0.0

0.2

0.4

0.6

0.8

1.0Antibodies

Cell Lines

Constructs

Knockdown reagents

Organisms

http://www.force11.org/Resource_Identification_Initiative

Numerous endorsers https://www.force11.org/RII/SignUpImplementation of the new standard http://biosharing.org/bsg-000532

RRIDs should be:

Machine Readable

Consistent across publishers and journals

Free to generate and access

Sample citation:

Polyclonal rabbit anti-

MAPK3

antibody, Abgent, Cat#

AP7251E,

RRID:AB_2140114

1.

Research

er

submits a

manuscri

pt for

publicatio

n

2. Editor or

Publisher

asks for

inclusion of

RRID

3. Author goes to

Research

Identification

Portal to locate

RRID

4. RRID is

included

in

Methods

section

and

as

Keyword

Publishing Workflow

What is the relationship of a

person to a publication?

Example Scenario

Melissa creates mouse1 David creates mouse2 Layne uses performs RNAseq analysis on mouse1 and mouse2 to generate dataset3, which he subsequently curates and analyzes

Layne writes publication pmid:12345 about the results of his analysis

Layne explicitly credits Melissa as an author but not David.

Credit is connected

=> Credit to Melissa is asserted, but credit to David can be inferred

Attribution Working Group

https://www.force11.org/group/attributionwg

Project CredITVIVO-ISF ontologyPROVthe Becker modelTransitive creditThe Scholarly Contributions and Roles ontology

Goal is catalyze rapid convergence on requirements, approaches, and practical implementation of a system for tracking contributions to any scholarly product.

The 1K Challenge

What would you do with £1k today to make

research communication better, anticipating

the increasing scale of people and

machines?

Starting at Ground Zero

CONSULTATIONS

Researcher + 2-3 from

Data Stewardship Team

Researchers DO need

assistance: Finding and choosing data

standards

File versioning

Applying metadata to

facilitate data sharing

“Gummi Bear” themed

data management

exercise resonated well

with students

Lack of awareness of

services and expertise

offered by the Library

OHSU Library is

developing data

services for researchers

http://laughingsquid.com/the-anatomy-of-a-

gummy-bear-by-jason-freeny/

Conclusions and new directions

DOI:10.6083/M4QC0273

https://www.force11.org/force2015/1k-challenge-vote

Join the Force11: https://www.force11.org/

“Meta Makes My Machine Marvellous (5M)”“Crowdreviewing: the sharing economy at its finest”“Science bots”“scientific articles are too expensive to publish and to read”

FORCE11 Vision• Modern technologies enable vastly improve knowledge transfer and far wider

impact; freed from the restrictions of paper, numerous advantages appear

• We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge

• To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts

• To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute

• To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable

What is the 21st century equivalent of the library?

Acknowledgements

Maryann MartonePhil BourneMichel DumontierNicole VasilevskyStephanie Hagstrom

And all 1000+ members of