data- and compute-driven transformation of modern science how e-infrastructure & policy support...

Download Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President,

If you can't read please download the document

Upload: beryl-carr

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology 1
  • Slide 2
  • 2 Part 1: We are in a period of unprecedented change in Science and Societythe crises & opportunities this creates
  • Slide 3
  • 3 3 Profound Transformation of Science Collision of Two Black Holes 1972: Hawking. 1 person, no computer 50 KB 1994: 10 people, NCSA Cray Y-MP, 50MB 1998: 15 people, NCSA Origin, 50GB
  • Slide 4
  • Community Einstein Toolkit Einstein Toolkit : open software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI. Consortium: 92 members, 46 sites, 15 countries Whole consortium engaged in directions, support, development Simulation: Luciano Rezzolla, Max Planck Institut fr Gravitationsphysik (AEI) 4 Community + software + algorithms + hardware + Many groups can do this: field explodes! Major triumph of Computational Science---solve EEs!
  • Slide 5
  • New Frontiers: Relativistic Matter Nuclear equations of state Collab. with astrophysicists General relativistic magnetohydrodynamics Some groups have ideal MHD Radiation Transport (neutrinos/photons) Expensive and complicated! Requires opacities/emissivities Chemical reactions (thermonuclear, chemical) SN community! Computation: Multiphysics!! GRMHD: petascale problem Radiation transport beyond this Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007 Zoom in to just this part: post BH formation and evolution of jet Zoom in to just this part: post BH formation and evolution of jet
  • Slide 6
  • Schnetter, et al Post BH Formation/Evolution of Jet Multiphysics GR, Neutrinos, MHD, Nuclear EOS Computer Science 10 level AMR, optimized for Blue Waters Science 200s evolution, analyze it all Blue Waters 6M Pflop@1PF sustained = 70days! 6 Multiphyiscs framework needed for fluids in astrophysics and porous media Rezzolla, et al
  • Slide 7
  • 7 Theory and Observation of Universe Gravitational Waves! Complex problems in relativistic astrophysics Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab! Observe (PB), compute (PF) signals Gravity and general relativity are transformed 4 centuries of small science, small data culture 2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration LIGO/VIRGO/GEO New era of science after a century! Data- and compute- dominated gravitational wave astronomy!
  • Slide 8
  • US Council on Competitiveness Ping Golf Moved from workstation to Cray, now make prototype only at last stage of design Too effective: had to simulate less effective design! Proctor & Gamble Pringles flying off manufacturing line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not lift off 8
  • Slide 9
  • 9 Part 1a: The Growth of Data Data Tsunami Im still here But Im your new baby big brother With millions of processors
  • Slide 10
  • Going Beyond a Community Transient & Multi-Messenger Astronomy 10 New era: seeing events as they occur Here now ALMA, EVLA in radio Ice Cube neutrinos On horizon 24-42m optical? LSST = SDSS (40TB) every night! SKA = exabytes Simulations integrate all physics Data-intensive = compute-intensive Astronomy 1500-2010 was passive. No longer! Communities need to share data, software, knowledge, in real time Will require integration across disciplines, end-to- end
  • Slide 11
  • Big Data vs The Long Tail of Science Many Big Data projects are special Tend to be highly organized, have singular sources of data, professionally curated, a lot attention paid to them What about the Long Tail (the other 99%)? Thousands of biologists sequencing communities of organisms Thousands of chemist and materials scientists developing a materials genome Millions of people Tweeting Characteristics: Heterogeneous, perhaps hand generated Not curated, reused, served, etc 11 How do we harness the power of this long tail? News Flash! NYT 6/3/13: Drug side effects discovered by mining web logs: paroxetine + pravastatin = high blood sugar!
  • Slide 12
  • 12 Grand Challenge Communities Combine it All... Where is it going to go? 12 Same CI useful for black holes, hurricanes
  • Slide 13
  • 13 Grand Challenge Communities for Complex Problems Require many disciplines, all scales of collaborations Individuals, groups, teams, communities Multiscale Collaborations: Beyond teams Are dynamic and highly multidisciplinary Time domain astronomy, emergency forecasting, metagenomincs, materials genome Drive sharing technologies and methodologies Researchers collaborate, work by sharing data. Places requirements on eInfrastructre: Software, networks, collaborative environments, data, sharing, computing, etc Scientific culture, reproducibility, access, university structures Publications. What is a modern publication? 13 Social, behavioral and economic sciences will be critical in helping us understand these issues
  • Slide 14
  • Scenarios like this in all fields 14 NEON+GIS
  • Slide 15
  • Framing the Challenge: Science and Society Transformed by Data Modern science Data- and compute- intensive Integrative, multiscale 4 centuries of constancy, 4 decades 10 9-12 change! Multi-disciplinary Collaborations Individuals (Galileo!) Groups, teams, Grand Challenge Communities Big Data + Long Tail Sea of Data Age of Observation 15 We still think like this But such radical change cannot be adequately addressed with (current) incremental approach! Students take note!
  • Slide 16
  • Part 2: Crises, Challenges, Opportunities Computing Data Software End-to-end Networks Organizational structures Education No, we are not Cybe r Instruments & Facilities
  • Slide 17
  • 17 Five Crises CDSE Community needs to address Computing Technology Multicore: processor is new transistor Programming model, fault tolerance, etc New models: clouds, grids, GPUs, Data, provenance, and visualization How do we create data scientists? What is an international data infrastructure? Software treated as e-Infrastructure Complex applications on coupled compute- data-networked environments, tools needed Modern apps: 10 6 + lines, many groups contribute, take decades
  • Slide 18
  • 18 Five Crises Organization for Multidisciplinary & Computational Science Universities must significantly change organizational structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science Itself a discipline, computational science advances all scienceinadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field Education The CI environment is running away from us! How do we develop a workforce to work effectively in this world? How do universities transition?
  • Slide 19
  • Scientific Computing and Imaging Institute, University of Utah Data Crisis: Information Big Bang PCAST Digital Data NSF Experts Study Wired, Nature Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report there is a pending crisis in archiving we have to create long-term methods for preserving information, for making it available for analysis in the future. 80% respondents: >50 yrs; 68% > 100 yrs Industry
  • Slide 20
  • The Shift Towards a Sea of Data Implications Science & society are now data-dominated Experiment, computation, theory US mobile phone traffic exceeded 1 exabyte! Classes of data Collections, observations, experiments, simulations Software Publications Totally new methodologies Algorithms, mathematics, culture Data become the medium for Multidisciplinarity, communication, publication, science, economic development 20 How do we attribute credit for this new publication form? How are data peer reviewed? What is a publication in the modern data-rich world? What is a business model for OA? Fundamental questions become focused around data: What to curate, how to remove boundaries? How to incentivize sharing? IP?
  • Slide 21
  • 21 Part 2a: Recommendations
  • Slide 22
  • Software ACCI Task Force Reports Final recommendations presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010 More than 25 workshops and Birds of a Feather sessions, 1300 people involved Final reports on-line Permanent programmatic activities in Computational and Data-Enabled Science & Engineering (CDS&E) should be established within NSF. Grand Challenges Task Force NSF should establish processes to collect community requirements and plan long-term software roadmaps. Software Task Force Higher education should adopt criteria for tenure and promotion that rewardthe production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software. Campus Bridging Task Force 22 Campus Bridging Data & Viz Grand Challenge HPC Learning
  • Slide 23
  • Recommendation of NSF Advisory Committee on Cyberinfrastructure ACCI "The National Science Foundation should create a program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices." 23
  • Slide 24
  • 24 Part 3: Universities attempt to respond We have to do all this and revolutionize the state/national economy?
  • Slide 25
  • S koltech: Example of a 21 st Century University in the Making
  • Slide 26
  • Skoltech at a Glance A unique Russian institution in international context This decade: a community of 200 faculty, 300 post- docs, 1200 graduate students Focused on science, engineering and technology Addressing problems and issues in IT, Energy, Biomedicine, Space and Nuclear Interdisciplinary by design; no departments 15 centers organized around complex problems With strong programs in support of innovation and entrepreneurship Creating a culture of innovation in every student, professor, staff member Important part of the Skolkovo innovation ecosystem Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development Integrated data, compute, instrumentation infrastructure and policy under development for Interdisciplinary research Accelerating discovery Economic development
  • Slide 27
  • 27 Part 3a: You can help lead this revolution Kathryn Gray
  • Slide 28
  • Modern Research & Education Ecosystem 28 SoftwareSoftware SoftwareSoftware Track 2 CampusCampusCampusCampusCampusCampus CampusCampus CampusCampusCampusCampusCampusCampusCampusCampus DataData DataData DataData DataData XSEDE Education Crisis: I need all of this to start to solve my problem! Blue Waters
  • Slide 29
  • The Opportunity (US picture)! Now have emerging national Integrated, High Performance Research Architecture Blue Waters and beyond towards exascale: high end Extraordinary science continues lead at cutting edge Traditional and novel large data applications Few places can house, field, or drive such a facility XSEDE architecture can connect Campus Bridging: campus to national CI Campus Assets: MRI, Instruments, DNA sequencers Facilities: Supercomputers, telescopes, accelerators, light sources, NEON More silicon than Steel Networks: end-to-end connectivity Where are those optical network apps? 29
  • Slide 30
  • Much to do to build CDSE on this Background: address the 5 Crises Education Many new opportunities and challenges CSE already has its struggles Now data: what is a data scientist? CDSE emerges Data opportunities for education and citizen science Faculty development, curriculum development Needed on every campus Talk to NSF, DOE, EC, your national agencies Recommendations of ACCI, MPSAC, etc New programs needed: See NSF CDS&E, CI TraCS, CAREER, LWD, etc You can help make this happen 30
  • Slide 31
  • Key Messages Astounding rate of change of the Triple Helix of Research, Education, and Innovation Computing and Data radically change methods Culture of collaboration around complex problems These create many crises and opportunities From technology to methodology to culture Deep integration required for science Emergence of Computational and Data- enabled Science and Engineering as a discipline and your role! A key part of the paradigm shift 31
  • Slide 32
  • 32 & Data