bd2k @ nih - a vision through 2020
TRANSCRIPT
BD2K @ NIH – A Vision Through 2020
Philip E. Bourne, PhD, FACMIAssociate Director for Data Science
First and foremost you should see this meeting as a celebration of the hard work of the past two years
Yes these are uncertain times, but …
There is a commitment to the BD2K program through 2020
BD2K cannot be viewed in isolation, but rather as part of a broader view of data science @ NIH …
Particularly as funding is increasingly from the IC’s
A View Which Includes:
• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities
• Increased workforce training• A changing governance model
A Strategic Response can be Modeled on Three Axes:
Research
Resources
Outcomes
A Strategic Response
Research
Resources
Outcomes
• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …
• Applied• Sustainability, governance,
economics of data• Privacy and security• Effective use of clouds …
A Strategic Response
Research
Resources
Outcomes
• Standards• Commons
APIsReference data setsWorkflowsAccess &
Authentication• Workforce
• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …
• Applied• Sustainability, governance,
economics of data• Privacy and security• Effective use of clouds …
A Strategic Response
Research
Resources
Outcomes
• Standards• Commons
APIsReference data setsWorkflowsAccess &
Authentication• Workforce
• Fundamental• Machine learning• Data mining• Indexing• Predictive modeling …
• Applied• Sustainability, governance,
economics of data• Privacy and security• Effective use of clouds …
• Evaluated pilots• FAIR data• Trained workforce• Best practices• Policies• Effective use of clouds• On-ramps for all IC’s
A View Which Includes:
• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities
• Increased workforce training• A changing governance model
The Current Situation
• NIH Funded Data– Total data from NIH-funded research currently estimated at 650 PB*– 20 PB of that is in NCBI/NLM (3%) and it is expected to grow by 10 PB this year
• Dark Data– Only 12% of data described in published papers is in recognized archives –
88% is dark data^
• Cost– 2007-2014: NIH spent ~$1.2Bn extramurally on maintaining data archives
* In 2012 Library of Congress was 3 PB^ http://www.ncbi.nlm.nih.gov/pubmed/26207759
The Commons - Status
• Commons and FAIR principles* adopted across NIH• Development and public release of a prototype Data
Discovery Index– DataMed
• Feb. v 1.0• Nov v 1.5
• Cloud credits being issued for work in the Commons• FOA’s for Commons Framework being issued• Commons pilots under way
* https://www.ncbi.nlm.nih.gov/pubmed/26978244
Sustainability – Sample Other Activities
• Request for Information: Metrics to Assess Value of Biomedical Digital Repositories (NOT-OD-16-133)– To be discussed at Sustainability Session, Wed 1pm
• RFA to support community based standards work was released in the fall for May 2017 award, session today 1pm
• Funding opportunity announcement: (BD2K) Enhancing the Efficiency and Effectiveness of Digital Curation for Biomedical Big Data (RFA-LM-17-001)Applications due Dec 15
Sustainability – Looking Forward
• International collaboration on business models for sustainable data repositories– Sustainable Business Models for Data Repositories (OECD Global
Science Forum)– Future of Life Sciences and Biomedical Databases (International
Human Science Frontiers Program)• NIH long-term data repository support
– Federal interagency Workshop on Measuring the Impact of Data Repositories, 2017
– Recommend mechanism(s), review criteria, implementation plan
Example Cross-cutting Activities
• International partnerships• Count everything – Secure count query
framework• California centers regional meetings• GA4GH – Beacon project
A View Which Includes:
• A vibrant research program of:– Fundamental developments in data science– Application of those fundamental developments– Flagship projects to which developments are applied:
• PMI, Brain, Moonshot, ECHO
• A sustainable data ecosystem– Commons and the FAIR Principles adoption– Cross-cutting activities
• Increased workforce training• A changing governance model
NLM
• Working Group Report – http://
acd.od.nih.gov/reports/Report-NLM-06112015-ACD.pdf
– Recommendation – NLM should become the programmatic epicenter for data science at NIH …
• Patti Brennan – New NLM director
What We Hope to See in 2020
• New innovations bought about by large and complex data
• Evidence of translation i.e. real application at the point of care
• Broad Commons adoption leading to– Improved sharing, reuse and hence cost effectiveness and
reproducibility• A balance between what is spent on data vs what is
gained from that data• Policies that are supportive of the above
… for your hard work and to the NIH staff from the ADDS office and from across the IC’s who have toiled to make BD2K a success