big data analytics in science and research: new drivers ... johnson.pdf · the fourth paradigm, the...
TRANSCRIPT
Big Data Analytics in Science and Research: New Drivers for
Growth and Global Challenges
Richard A. Johnson CEO, Global Helix LLC and BLS, National Academy of Sciences
ICCP Foresight Forum – Big Data Analytics and Policies
22 October 2012
Session 3: 4 Questions for Discussion
Q1 – Importance of data openness and interoperability for science and research, especially in biomedicine and health?
Q2 – Are current IPR regimes ≈ data-intensive scientific discovery?
Q3 – Do we still need scientific methods (and traditional domain scientists) in an era of big data analytics?
Q4 – How, and why, does this matter for policy?
Convergence of Biology with Physical Sciences & Engineering through Data and Data Analytics = the
“New Biology” or Third Revolution in the Life Sciences Foundational trend in STI for next 20 years – NAS (2010); MIT (2011)
Genomic Data is Increasing Faster than Computing Power –
Convergence of 3 key DATA DRIVERS with RESEARCH and ECONOMIC VALUE: (1)Sequencing + (2) Synthesis + (3) Reading AND Writing DNA
Data Tools in the Life Sciences: Moore’s Law on Steroids
Gene Expression Data Sets (Nature 2012)
Life Sciences and Biomedical Research as an Information Science: Quantitative, Data-driven,
Simulation-oriented, Predictive Science
Data and Convergence Driving the Future: Data Analytic Tools, Platforms, and Measurement for New Sources of Growth
6
• Technology Convergence, Data Analytics and Metrology as Interdependent Drivers (Agilent 2012)
Synthetic Biology
Energy and the Environment
Advancing High Growth Economies
Portable, Mobile and Out-of-Lab
Nanotechnology
Food Safety
Personalized Medicine
Single Cells and Microbiome
Intern Executive Speaker Series
Beyond Interoperability, The Power of Interconvertibility: FROM
PHYSICAL LIVING MATERIAL/DNA to DIGITAL DATA, and back 1’s and 0’s ↔ A, C, T, G’s
“IT from Bits” (Poste 2012)
• Programming: increasing ability to both Read and Write DNA
• DNA Construction (analog to Read/Write; 1’s and 0’s manipulation) - Genetic Expression Operating Systems; Scale DNA construction engineering
• Data enables Decoupling:
biological processes from evolution-based descent and replication + design from fabrication
Tools to Edit and Write Genomes: MAGE + CAGE (Church/Isaacs 2011, 2012)
Big Data and Data Analytics Drive new 21st Century Infrastructures and KNMs, and Create Opportunities for New
Research, Better Health Outcomes, and Value Creation (Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and New Taxonomy of Disease: NAS 2011)
The Creative Destruction of Medicine (Topol 2012)
Data Sharing, Disease Modeling and
Biomarkers to Accelerate the Development
Big Data and Engineering Biology as the Transformative “New Normal” in the Life Sciences Driving New Sources of Growth
Synthetic Biology - Standardization, Abstraction and Modularity
Predictive Platforms for Engineering Biology and Predictable Integration of new Genetic Designs built on Massive Data • “an Engineering
METHODOLOGY to construct complex systems and novel properties based on biological components” (EU-US Task Force, June 2010)
Data-driven and Engineering Biology “Value Proposition” Increasingly Drives Science, New Sources of Growth, and our
ability to meet societal Grand Challenges – NAS 2011
Neuroscience – a 21st Century Frontier for Human Understanding and Grand Challenges
Traversing the scales at all levels in understanding the brain from molecular and cellular to systems – neurons (100 Billion)/synapses (150 Trillion), and neural signaling
Human Connectome Project = mapping neural networks with >1 million more connections than the genome has letters of DNA, and linking all this to other life experience data sets
ENCODE: the Encyclopedia of DNA Elements – Big Data, Data Analytics, and Big Science increasingly change how we do
science (Sept. 2012)
The Plasticity of IPR/Open Science Meanings – and lots of rethinking in different domains about IPR, Openness and
Scientific Research
• IPR and Competing Visions of Openness
Open Science (Public domain; BioBricks library/BBF) v. Open Source (IPR-driven; GPL, BSD, CC) v. Open Standards v. Open Development v. Open Access (including reuse and sharing public-funded data) v. Open Innovation (depends on strong, well-functioning IPR system)
• Innovative New Thinking– e.g., Semi-commons as a new lens to view Data – interacting common and private uses that are dynamic/scalable over the same resources and that can adjust through contracting and other mechanisms
• Knowledge Networks and Markets (KNMs) and Knowledge-based Capital KBC) – major OECD initiatives on-going
• Growing Counter-intuitive View that Role of IPR Increasingly Important as a Tool to Promote Openness, Transparency, and Diffusion , e.g., Algorithms, Data Exchanges, Tools and Re-use
Growing Linkage of Data-intensive Science, IPR, and New Models of Innovation: Big Data Analytics Intersect with
Open Innovation, Multi-directional S&T, University-Industry Partnering, New Business Models, Forward-looking IPR, and New Public-Private Collaborative Mechanisms to Enable Cutting-edge
Research and Innovation
The Fourth Paradigm, the Internet of Things, Automated Data Extraction Methods, and Big Data Analytics – the Need for a New
Generation of Scientific computing tools and platforms to manage, visualize and analyze Big Data for Research
(Gray 2009)
Wide Range of New Data Analytic Convergence Challenges with Policy Implications (Gray 2009)
Risks to Scientific Research from (Bad) Data Analytics?
- Jeopardize reproducibility
- Retard pace of research
- Produce poorly written code/bad algorithms on which science relies
- Create serious errors in scientific outcomes, and the interpretations of them
New Day-to-day Science Research Implications of Big Data: Data Analytics Challenges
• Which data to keep – in what format? for how long? • What about “emergent properties”? – resulting from
elaborate networks of interactions and data patterns • How to deal with data distributed across many
locations, formats, scales, etc., and merge them? • How to model large complex data, and derive valuable
knowledge from analytics/models? • How to infuse data into complex computations to
enable simulations of predictive value? • How to deal with different kinds of big data (temporal,
spatial, dimensional, heterogeneous) – Massive data – High-dimensional data – Multi-modal data – Real-time and Streaming data
In a data-driven science era, should we still fund, “incentivize” and value Empirical, Theoretical, Model-based Approaches to Scientific
Discovery? Is Popper’s scientific method paradigm outdated?
• “I believe that math is trumping science. What I mean by that is you don't really have to know why, you just have to know that if a and b happen, c will happen.” Vivek Ranadivé, entrepreneur and CEO, financial-data software company TIBCO (2011)
• “With enough numbers, the data speak for themselves” Chris Anderson, Editor-in-Chief, Wired, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (2008)
• “All models are wrong, and increasingly you can succeed without them.” Peter Norvig, Director of Research, Google
• “The numbers have no way of speaking for themselves….Data-driven
predictions can succeed — and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.” Nate Silver, The Signal and the Noise: Why So Many Predictions Fail – but Some Don’t (2012)
• “The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.” Stephen Jay Gould, American evolutionary biologist (1981)
Thank you!
Contact Information -- Richard A. Johnson
CEO, Global Helix LLC
MIT