Download - The Future of Open Science
NIAID Workshop on Open Science 1
The Future of Open Science
Philip E. Bourne
http://www.slideshare.net/pebourne/
4/08/14
NIAID Workshop on Open Science 2
The future depends on who you ask
Here is my biased viewpoint
4/08/14
NIAID Workshop on Open Science 3
My Background/Bias
• RCSB PDB/IEDB Database Developer – Views on community, quality, sustainability …
• PLOS Journal Co-founder – Open science advocate• Associate Vice Chancellor for Innovation – Business
models, interaction with the private sector, sustainability
• Professor – Mentoring, reward system, value (or not) of research
• NIH Strategist/Transformer - ??4/08/14
NIAID Workshop on Open Science 4
Perhaps the first question to ask is:
What is an endpoint?
4/08/14
NIAID Workshop on Open Science 5
What is an Endpoint?
4/08/14
NIAID Workshop on Open Science 6
What Does The Democratization of Science Imply?
• The obvious – participation by all• Not so obvious
– More scrutiny – New types of rewards– More equal value placed on all participants– The removal of artificial boundaries that corral
knowledge (through power and resources) within silos that do not make sense as complexity increases
4/08/14
NIAID Workshop on Open Science 7
Consider some personal examples that illustrate these implications
4/08/14
More Scrutiny – Highlights Lack of Reproducibility
• I can’t immediately reproduce the research in my own laboratory:
• It took an estimated 280 hours for an average user to approximately reproduce the paper
• Workflows are maturing and becoming helpful• Data and software versions and accessibility
prevent exact reproducibility
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE 8(11) e80278 .
NIAID Workshop on Open Science
84/08/14
NIAID Workshop on Open Science 9
Why New Types of Rewards?
• I have a paper with 16,000 citations that no one has ever read
• I have papers in PLOS ONE that have more citations than ones in PNAS
• I have data sets I am proud of few places to put them
• I edited a journal but it did not count for much
4/08/14
NIAID Workshop on Open Science 10
Equal Value Placed on Participants
• The UC System has Research Scientists (RS) & Project Scientists (PS) as well as tenured faculty -– RS/PS have no senate rights yet:– RS/PS frequently teach– RS/PS frequently have more grant money– RS/PS typically perform more service– RS/PS are most of the data scientists you know
4/08/14
NIAID Workshop on Open Science 11
Are Increasingly Found on the Google Bus
4/08/14
NIAID Workshop on Open Science 12
Institutional Boundaries
• Academia – Departments of physics, math, biology, chemistry etc. persist but scholars rarely confine themselves to these disciplines
• NIH – 27 institutes and centers, many dedicated to specific diseases & conditions – yet a specific gene undoubtedly transcends ICs
4/08/14
The Era of Open Has The Potential to Deinstitutionalize
NIAID Workshop on Open Science
13
Daniel Hulshizer/Associated Press
4/08/14
An Example of That Potential:The Story of Meredith
NIAID Workshop on Open Science
14
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
4/08/14
The Era of Open Has The Potential to Deinstitutionalize
NIAID Workshop on Open Science
15
Daniel Hulshizer/Associated Press
4/08/14
NIAID Workshop on Open Science 16
I have argued that the democratization of science is compelling
and that much has happened around open literature, open software and now open data
4/08/14
NIAID Workshop on Open Science 17
I Would Also Argue That This Process is About to Accelerate
• Others provide a more compelling argument:– Google car– 3D printers– Waze– Robotics
4/08/14
NIAID Workshop on Open Science 18
From the Second Machine Age
4/08/14
From: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson & Andrew McAfee
NIAID Workshop on Open Science 19
So what will this look like for an institution?
4/08/14
Institutions will become digital enterprises
NIAID Workshop on Open Science 20
Components of The Academic Digital Enterprise
• Consists of digital assets– E.g. datasets, papers, software, lab notes
• Each asset is uniquely identified and has provenance, including access control– E.g. publishing simply involves changing the access
control• Digital assets are interoperable across the
enterprise
4/08/14
NIAID Workshop on Open Science 21
Life in the Academic Digital Enterprise
• Jane scores extremely well in parts of her graduate on-line neurology class. Neurology professors, whose research profiles are on-line and well described, are automatically notified of Jane’s potential based on a computer analysis of her scores against the background interests of the neuroscience professors. Consequently, professor Smith interviews Jane and offers her a research rotation. During the rotation she enters details of her experiments related to understanding a widespread neurodegenerative disease in an on-line laboratory notebook kept in a shared on-line research space – an institutional resource where stakeholders provide metadata, including access rights and provenance beyond that available in a commercial offering. According to Jane’s preferences, the underlying computer system may automatically bring to Jane’s attention Jack, a graduate student in the chemistry department whose notebook reveals he is working on using bacteria for purposes of toxic waste cleanup. Why the connection? They reference the same gene a number of times in their notes, which is of interest to two very different disciplines – neurology and environmental sciences. In the analog academic health center they would never have discovered each other, but thanks to the Digital Enterprise, pooled knowledge can lead to a distinct advantage. The collaboration results in the discovery of a homologous human gene product as a putative target in treating the neurodegenerative disorder. A new chemical entity is developed and patented. Accordingly, by automatically matching details of the innovation with biotech companies worldwide that might have potential interest, a licensee is found. The licensee hires Jack to continue working on the project. Jane joins Joe’s laboratory, and he hires another student using the revenue from the license. The research continues and leads to a federal grant award. The students are employed, further research is supported and in time societal benefit arises from the technology.
From What Big Data Means to Me JAMIA 2014 21:194
4/08/14
NIAID Workshop on Open Science 22
Life in the NIH Digital Enterprise
• Researcher x is made aware of researcher y through commonalities in their data located in the data commons. Researcher x reviews the grants profile of researcher y and publication history and impact from those grants in the past 5 years and decides to contact her. A fruitful collaboration ensues and they generate papers, data sets and software. Metrics automatically pushed to company z for all relevant NIH data and software in a specific domain with utilization above a threshold indicate that their data and software are heavily utilized and respected by the community. An open source version remains, but the company adds services on top of the software for the novice user and revenue flows back to the labs of researchers x and y which is used to develop new innovative software for open distribution. Researchers x and y come to the NIH training center periodically to provide hands-on advice in the use of their new version and their course is offered as a MOOC.
4/08/14
NIAID Workshop on Open Science 23
To get to that end point we have to consider the complete digital research lifecycle
4/08/14
24
The Digital Research Life Cycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
4/08/14 NIAID Workshop on Open Science
NIAID Workshop on Open Science 25
Tools and Resources Will Be Better Coordinated
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
4/08/14
NIAID Workshop on Open Science 26
Through Interconnection Around a Common Framework
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
4/08/14
New/Extended Support Structures Will Emerge
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
4/08/14 NIAID Workshop on Open Science 27
NIAID Workshop on Open Science 28
We Have a Ways to Go
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
Software
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
4/08/14
NIAID Workshop on Open Science 29
But Lets Not Forget NIH has Contributed a Lot
• NLM/NCBI• Individual IC support• Open access policies – PubMed Central• Emergent data sharing plans• Big Data to Knowledge (BD2K)• Office of the Associate Director for Data
Science• .. And more to come…
4/08/14
NIAID Workshop on Open Science 30
Call Out to Eric Green, and the Team…
4/08/14
bd2k.nih.gov
NIAID Workshop on Open Science 31
Interesting Observations So Far
• We need to start by asking, how are we using the data now?
• We have the why for data sharing, but not the how
• Training is spotty• Existing data resources
need attention• Sometimes it is enough
for me to sit down
4/08/14
Office of Data Science
Data Commons
TrainingCenter BD2K Review
Sustainability Education Innovation Process
• Cloud – Data & Compute
• Search• Security • Reproducibility
Standards• App Store
• Hands-on• MOOCs
• Community Engagement
• Data Science Centers
• Training Grants• DDI• Analysis• Domain Support
• Data Resource Support
• Metrics• Best Practices• Evaluation• Portfolio
Analysis
The Biomedical Research Digital Enterprise
Communication
Collaboration
Programmatic Theme
Deliverable
Example Features• To IC’s• To Researchers• To Federal
Agencies• To International
Partners• To Computer
Scientists
Scientific Data Council External Advisory Board
04/03/14
33
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
One Possible End Point
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e344/08/14
NIAID Workshop on Open Science 34
Open Science Will:
• Lead to the democratization of science• Change how institutions think and operate – they
will become digital enterprises• Impact all aspects of the scholarly research lifecycle
• Accelerate seek{ing} fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability
4/08/14
NIAID Workshop on Open Science 35
Thank You!Questions?
Acknowledgements• Vivien Bonazzi• Eric Green• Mark Guyer• Jennie Larkin• David Lipman• Peter Lyster• Many more….
4/08/14