big data and the future of publishing

12
Big Data and the Future of (Physics) Publishing Anita de Waard, VP Research Data Collaborations Elsevier RDM Services Columbia University, June 2, 2017 Present

Upload: anita-de-waard

Post on 28-Jan-2018

173 views

Category:

Science


6 download

TRANSCRIPT

Page 1: Big Data and the Future of Publishing

Big Data and the Future of (Physics) Publishing

Anita de Waard, VP Research Data Collaborations

Elsevier RDM Services

Columbia University, June 2, 2017

Present

Page 2: Big Data and the Future of Publishing

Data is becoming distributed

Michael Tuts:

Ideas are becoming distributed

Kirk Borne:

Tools are becoming

distributed

Mike Hildreth:

• Preserved workflows can be used to

compare new models with a published

analysis

• Reinterpretation possible with full detector

simulation, analysis chain

• “Folding” rather than “Unfolding” like in

HEPData

Page 3: Big Data and the Future of Publishing

Ideas are becoming distributed

Tools are becoming

distributed

Easy to create networks of

tools to run anywhere

(Docker, Jupyter Notbook

collections etc)

Many sources, formats,

owners, types: global,

interconnected

Computers make hypotheses, too*;

citizen science/MOOCs enable

ubiquitous access to knowledge

*

http://ieeexplore.ieee.org/abstract/document/7

515118/: Computer-Aided Discovery: Toward

Scientific Insight Generation with Machine

Support

Data is becoming distributed

Page 4: Big Data and the Future of Publishing

Data

Tool

Article

Resear

cher

Towards Networked Knowledge:

Page 5: Big Data and the Future of Publishing

Science Can Now Scale With the Network!

5

https://en.wikipedia.org/wiki/Metcalfe%27s_law

http://spectrum.ieee.org/computing/networks/metcalfes-law-is-wrong

• Metcalfe's Law: The value of a (telecommunications)

network is proportional to the square of the number of

connected users of the system (n2).

• Reed’s Law: Proportional to 2^n (-1)

Page 6: Big Data and the Future of Publishing

|

Crisis # 1: Reproducibility/Scientific Integrity

Richard Feynman on Scientific Integrity:

• If you're doing an experiment, you should report

everything that you think might make it

invalid - not only what you think is right about it.

• Details that could throw doubt on your

interpretation must be given, if you know

them.

• If you make a theory, for example, and advertise

it, or put it out, then you must also put down all

the facts that disagree with it, as well as those

that agree with it.

• When you have put a lot of ideas together to

make an elaborate theory, you want to make

sure, when explaining what it fits, that those

things it fits are not just the things that gave

you the idea for the theory; but that the finished

theory makes something else come out right,

in addition.

http://calteches.library.caltech.edu/51/2/CargoCult.htmhttp://theconversation.com/the-science-reproducibility-

crisis-and-what-can-be-done-about-it-74198

Page 7: Big Data and the Future of Publishing

|

Crisis # 2: Not Enough Brains To Interpret All This!

https://www.aps.org/programs/education/statistics/

https://www.insidehighered.com/news/2013/10/03/departments-under-threat-few-majors-physicists-say-value-isnt-reflected-numbers

0%

10%

20%

30%

40%

50%

60%

1995 1997 1999 2001 2003 2005 2007 2009 2011 2013

% t

o T

em

po

rary

Re

sid

en

ts

Doctoral Degrees

Master's Degrees

Bachelor's Degrees

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

0

50,000

100,000

150,000

200,000

250,000

300,000

350,000

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

Ph

ysic

s

STEM

STEM

Physics

To paraphrase Remi the Rat (Ratatouille):

‘Not everyone can be a great scientist, but a great

scientist can come from anywhere’

Page 8: Big Data and the Future of Publishing

|

Crisis # 3:

Page 9: Big Data and the Future of Publishing

|

Networked Knowledge To The Rescue!

1. Reproducibility:

Disconnect creation of data from

interpretation to prevent confirmation bias

2. Lack of brains:

Making data and tools available to the

planet allows interested outsiders to help

explore new interpretations; support

tutoring

3. Diminishing trust/funding:

Putting datasets in multiple places and

allowing many different parties to

participate helps make systems

sustainable

9

Page 10: Big Data and the Future of Publishing

| 10

Data

Journal

Inst. Data

Repositorie(s)

Lab

ELN(s)

Data search

Data Management

Plans

Metadata, methods & protocols

ready for preservation and

publishing

Link to article

Journal

Publish data

(under embargo)

Secure

discoverability

in & outside

the institution

FindTopic

Identify gaps

Plan & Fund

Discover data, people, methods & protocols

Collect, analyze & vizualize

Store, preserve & share

Publish

Prepare, reproduce, re-use & benchmark

Domain-specific

Repositories

Primary research data lifecycle

Integrate RDM and

monitor outputs

So How Do You Publish A Network?

Page 11: Big Data and the Future of Publishing

|

https://www.rd-alliance.org/

http://www.nationaldataservice.org/

http://www.scholix.org/

https://www.force11.org/

https://ec.europa.eu/research/

openscience/index.cfm?pg=open-science-cloud

More About Our Collaborations And Tools:

https://www.hivebench.com/

https://datasearch.elsevier.com/#/

https://data.mendeley.com/

https://www.elsevier.com/authors/author-services/research-elements

The Research Data Alliance

(RDA) builds the social and

technical bridges that enable

open sharing of data.

Links existing data

archiving and sharing

efforts together with a

common set of tools.

A framework for

exchanging information

about links between

literature and data

A community of scholars,

librarians, and others that

helps facilitate the change

toward improved knowledge

creation and sharing.

A blueprint for cloud-based

services and data infrastructure

to ensure science, business and

public services reap the benefits

of the big data revolution.

An Electronic Lab Notebook

that helps prepare,

conduct and analyze

experiments vritually.

Search for research data

across domains and

repositories.

A secure cloud-based

repository, making it easy to

share, access and cite data.

Research Elements:

Publish data, software,

materials and methods in

brief, citable articles

A service to support

research librarians in

tracking data sharing and

use across campus.

Page 12: Big Data and the Future of Publishing

• As tools, software and data become distributed,

science experiences the network effect

• This can solve three crises facing science:

• Detaching observation from interpretation

combats issues with reproducibility

• Opening up data and tools can draw new minds

to scientific reasoning

• Redundant storage and delivery systems and

new players in cyberinfrastructure relieve

dependencies on (US) gov’t funds

• “Networked science publishing” involves:

• Adapting to and being interoperable with many

different platforms, technologies, and scholarly

habits of practice

• Collaborating with others (institutions, funders

etc) to develop knowledge ecosystems

• Complying with/helping develop new standards,

in multi-stakeholder platforms

In Summary:

Anita de Waard, [email protected], June 2, 2017