data- and compute-driven transformation of modern science how e-infrastructure & policy ...

32
Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology 1

Upload: osgood

Post on 25-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research. Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Data- and Compute-Driven Transformation of Modern

ScienceHow e-Infrastructure & Policy

Support Paradigm Shifts in ResearchEdward SeidelSenior Vice President, Research and

InnovationSkolkovo Institute of Science and

Technology

1

Page 2: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

2

Part 1: We are in a period of unprecedented change in Science and Society…the crises & opportunities this creates

Page 3: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

33

Profound Transformation of Science

Collision of Two Black Holes

1972: Hawking. 1 person, no computer 50 KB

1994: 10 people, NCSA Cray Y-MP, 50MB

1998: 15 people, NCSA Origin, 50GB

Page 4: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Community Einstein Toolkit“Einstein Toolkit : open

software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI.”

Consortium: 92 members, 46 sites, 15 countries

Whole consortium engaged in directions, support, development

Simulation: Luciano Rezzolla, Max Planck Institut für Gravitationsphysik (AEI)

4

Community + software + algorithms + hardware + …

Many groups can do this: field

explodes! Major triumph of

Computational Science---solve

EEs!

Page 5: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

New Frontiers: Relativistic Matter Nuclear equations of state

Collab. with astrophysicists General relativistic

magnetohydrodynamics Some groups have ideal MHD

Radiation Transport (neutrinos/photons) Expensive and complicated! Requires opacities/emissivities

Chemical reactions (thermonuclear, chemical) SN community!

Computation: Multiphysics!! GRMHD: petascale problem Radiation transport beyond this

Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007

Zoom in to just this part: post BH formation

and evolution of jet

Page 6: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Schnetter, et al

Post BH Formation/Evolution of Jet

MultiphysicsGR, Neutrinos, MHD, Nuclear EOS

Computer Science10 level AMR, optimized for Blue Waters

Science200s evolution, analyze it all

Blue Waters6M Pflop@1PF sustained = 70days!

6

Multiphyiscs framework

needed for fluids in astrophysics

and porous media…

Rezzolla, et al

Page 7: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

7

Theory and Observation of Universe Gravitational Waves!

Complex problems in relativistic astrophysics

Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab!

Observe (PB), compute (PF) signals Gravity and general relativity are transformed

4 centuries of small science, small data culture

2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration

LIGO/VIRGO/GEO

New era of science after a century!

Data- and compute-dominated

gravitational wave astronomy!

Page 8: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

US Council on Competitiveness Ping Golf

Moved from workstation to Cray, now make prototype only at last stage of design

Too effective: had to simulate “less effective” design!

Proctor & GamblePringles “flying” off manufacturing

line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not “lift off” 8

Page 9: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

9

Part 1a: The Growth of Data

Data Tsunami

“I’m still here…”

But I’m your new baby big brother…

With millions of

processors…

Page 10: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Going Beyond a CommunityTransient & Multi-Messenger

Astronomy

10

New era: seeing events as they occur Here now

ALMA, EVLA in radio

Ice Cube neutrinos On horizon

24-42m optical? LSST = SDSS

(40TB) every night! SKA = exabytes

Simulations integrate all physics

Data-intensive = compute-intensive

Astronomy 1500-2010 was passive. No longer!

Communities need to share data, software,

knowledge, in real time

Will require integration

across disciplines, end-

to-end

Page 11: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Big Data vs The Long Tail of Science

Many “Big Data” projects are “special”Tend to be highly organized, have singular

sources of data, professionally curated, a lot attention paid to them

What about the “Long Tail” (the other 99%)?Thousands of biologists sequencing

communities of organismsThousands of chemist and materials

scientists developing a “materials genome”Millions of people “Tweeting”…Characteristics:

• Heterogeneous, perhaps hand generated• Not curated, reused, served, etc…

11

How do we harness the power of this

long tail?

News Flash! NYT 6/3/13: Drug side

effects discovered by mining web logs:

paroxetine + pravastatin = high

blood sugar!

Page 12: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

12

Grand Challenge Communities Combine it All...

Where is it going to go?

12

Same CI useful for black holes, hurricanes

Page 13: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

13

Grand Challenge Communities for Complex Problems

Require many disciplines, all scales of collaborations Individuals, groups, teams, communitiesMultiscale Collaborations: Beyond teams

Are dynamic and highly multidisciplinaryTime domain astronomy, emergency forecasting,

metagenomincs, materials genome… Drive sharing technologies and methodologies Researchers collaborate, work by sharing

data. Places requirements on eInfrastructre:Software, networks, collaborative environments,

data, sharing, computing, etcScientific culture, reproducibility, access, university

structures“Publications.” What is a modern publication?

13

Social, behavioral and economic sciences

will be critical in helping us

understand these issues…

Page 14: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Scenarios like this in all fields

14

NEON+GIS

Page 15: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Framing the Challenge:Science and Society Transformed by

Data Modern science

Data- and compute-intensive

Integrative, multiscale 4 centuries of constancy,

4 decades 109-12 change! Multi-disciplinary

Collaborations Individuals (Galileo!) Groups, teams, Grand

Challenge Communities

Big Data + Long Tail Sea of Data

Age of Observation15

We still think like

this…

…But such radical change cannot be

adequately addressed with

(current) incremental approach!

Students take note!

Page 16: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Part 2: Crises, Challenges, Opportunities

Computing

Data

Software End-to-end

Networks

Organizational structures

Education

No, we are

not…

Cyber

Instruments & Facilities

Page 17: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

17

Five Crises“CDSE” Community needs to address

Computing Technology Multicore: processor is new transistor Programming model, fault tolerance, etc New models: clouds, grids, GPUs, …

Data, provenance, and visualization How do we create “data scientists”? What is an international data

infrastructure? Software treated as e-Infrastructure

Complex applications on coupled compute-data-networked environments, tools needed

Modern apps: 106+ lines, many groups contribute, take decades

Page 18: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

18

Five Crises Organization for Multidisciplinary &

Computational Science “Universities must significantly change organizational

structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science”

“Itself a discipline, computational science advances all science…inadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field”

Education The CI environment is running away from us! How do we develop a workforce to work

effectively in this world? How do universities transition?

Page 19: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Scientific Computing and Imaging Institute, University of Utah

Data Crisis: Information Big BangPCAST Digital Data

NSF Experts Study

Wired, Nature

Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report“there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs

Industry

Page 20: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

The Shift Towards a “Sea of Data”

Implications Science & society are now data-dominated

Experiment, computation, theoryUS mobile phone traffic exceeded 1 exabyte!

Classes of dataCollections, observations, experiments,

simulationsSoftwarePublications

Totally new methodologiesAlgorithms, mathematics, culture

Data become the medium forMultidisciplinarity, communication, publication,

science, economic development…20

How do we attribute credit for this new

publication form? How are data peer reviewed? What is a publication in the modern data-rich

world? What is a business model for OA?

Fundamental questions become

focused around data: What to curate, how to

remove boundaries? How to incentivize

sharing? IP?

Page 21: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

21

Part 2a: Recommendations

Page 22: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Software

ACCI Task Force Reports Final recommendations

presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010

More than 25 workshops and Birds of a Feather sessions, 1300 people involved

Final reports on-line“Permanent programmatic activities in Computational and Data-Enabled Science & Engineering (CDS&E) should be established within NSF.” Grand Challenges Task Force

“NSF should establish processes to collect community requirements and plan long-term software roadmaps.” Software Task Force“Higher education should adopt criteria for tenure and promotion that reward…the production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software.” Campus Bridging Task Force

22

Campus Bridging

Data & Viz

Grand ChallengeHPC

Learning

Page 23: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Recommendation of NSF Advisory Committee on Cyberinfrastructure

ACCI"The National Science Foundation should create a

program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices."

23

Page 24: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

24

Part 3: Universities attempt to respond

We have to do all this and

revolutionize the state/national

economy?

Page 25: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Skoltech: Example of a 21st Century University in the Making

Page 26: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Skoltech at a Glance• A unique Russian institution in international context

– This decade: a community of 200 faculty, 300 post-docs, 1200 graduate students

• Focused on science, engineering and technology– Addressing problems and issues in IT, Energy, Biomedicine,

Space and Nuclear• Interdisciplinary by design; no departments

– 15 centers organized around complex problems• With strong programs in support of innovation and

entrepreneurship– Creating a culture of innovation in every student,

professor, staff member• Important part of the Skolkovo innovation

ecosystem

Integrated data, compute, instrumentation infrastructure and policy under development for• Interdisciplinary research• Accelerating discovery• Economic development

Page 27: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

27

Part 3a: You can help lead this revolution

Kathryn Gray

Page 28: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Modern Research & Education Ecosystem

28

Software

Software

Track 2

Track 2

Track 2

Campus Campus Campus Campus Campus Campus CampusCampus

Data

DataData

Data

XSEDE

Education Crisis: I need all of this to start to solve

my problem!

Blue Waters

Page 29: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

The Opportunity (US picture)! Now have emerging national Integrated,

High Performance Research ArchitectureBlue Waters and beyond towards exascale:

high end• Extraordinary science continues lead at cutting

edge• Traditional and novel large data applications• Few places can house, field, or drive such a

facilityXSEDE architecture can connect…

• Campus Bridging: campus to national CI…– Campus Assets: MRI, Instruments, DNA

sequencers…– Facilities: Supercomputers, telescopes,

accelerators, light sources, NEON …» ”More silicon than Steel”

– Networks: end-to-end connectivity» Where are those optical network apps?

29

Page 30: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Much to do to build CDSE on this Background: address the “5

Crises” EducationMany new opportunities and challenges

• CSE already has its struggles• Now data: what is a “data scientist”?• CDSE emerges

Data opportunities for education and citizen science

Faculty development, curriculum developmentNeeded on every campus

Talk to NSF, DOE, EC, your national agenciesRecommendations of ACCI, MPSAC, etcNew programs needed: See NSF CDS&E, CI TraCS,

CAREER, “LWD”, etc You can help make this happen

30

Page 31: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

Key Messages Astounding rate of change of the

“Triple Helix” of Research, Education, and InnovationComputing and Data radically change

methodsCulture of collaboration around complex

problems These create many crises and

opportunitiesFrom technology to methodology to

culture…Deep integration required for science

Emergence of Computational and Data-enabled Science and Engineering as a discipline and your role!A key part of the paradigm shift

31

Page 32: Data- and Compute-Driven  Transformation of Modern Science How e-Infrastructure & Policy  Support Paradigm  Shifts in Research

32

& Data