data- and compute-driven transformation of modern science how e-infrastructure & policy ...
DESCRIPTION
Data- and Compute-Driven Transformation of Modern Science How e-Infrastructure & Policy Support Paradigm Shifts in Research. Edward Seidel Senior Vice President, Research and Innovation Skolkovo Institute of Science and Technology. - PowerPoint PPT PresentationTRANSCRIPT
Data- and Compute-Driven Transformation of Modern
ScienceHow e-Infrastructure & Policy
Support Paradigm Shifts in ResearchEdward SeidelSenior Vice President, Research and
InnovationSkolkovo Institute of Science and
Technology
1
2
Part 1: We are in a period of unprecedented change in Science and Society…the crises & opportunities this creates
33
Profound Transformation of Science
Collision of Two Black Holes
1972: Hawking. 1 person, no computer 50 KB
1994: 10 people, NCSA Cray Y-MP, 50MB
1998: 15 people, NCSA Origin, 50GB
Community Einstein Toolkit“Einstein Toolkit : open
software for astrophysics to enable new science, facilitate interdisciplinary research and use emerging petascale computers and advanced CI.”
Consortium: 92 members, 46 sites, 15 countries
Whole consortium engaged in directions, support, development
Simulation: Luciano Rezzolla, Max Planck Institut für Gravitationsphysik (AEI)
4
Community + software + algorithms + hardware + …
Many groups can do this: field
explodes! Major triumph of
Computational Science---solve
EEs!
New Frontiers: Relativistic Matter Nuclear equations of state
Collab. with astrophysicists General relativistic
magnetohydrodynamics Some groups have ideal MHD
Radiation Transport (neutrinos/photons) Expensive and complicated! Requires opacities/emissivities
Chemical reactions (thermonuclear, chemical) SN community!
Computation: Multiphysics!! GRMHD: petascale problem Radiation transport beyond this
Schnetter et al, PetaScale Computing: Algorithms and Applications, 2007
Zoom in to just this part: post BH formation
and evolution of jet
Schnetter, et al
Post BH Formation/Evolution of Jet
MultiphysicsGR, Neutrinos, MHD, Nuclear EOS
Computer Science10 level AMR, optimized for Blue Waters
Science200s evolution, analyze it all
Blue Waters6M Pflop@1PF sustained = 70days!
6
Multiphyiscs framework
needed for fluids in astrophysics
and porous media…
Rezzolla, et al
7
Theory and Observation of Universe Gravitational Waves!
Complex problems in relativistic astrophysics
Relativity, hydrodynamics, nuclear physics, radiation, neutrinos, magnetic fields: globally distributed collab!
Observe (PB), compute (PF) signals Gravity and general relativity are transformed
4 centuries of small science, small data culture
2-3 decades of radical change in both data (factors of 1000 per~5 years) and collaboration
LIGO/VIRGO/GEO
New era of science after a century!
Data- and compute-dominated
gravitational wave astronomy!
US Council on Competitiveness Ping Golf
Moved from workstation to Cray, now make prototype only at last stage of design
Too effective: had to simulate “less effective” design!
Proctor & GamblePringles “flying” off manufacturing
line causing significant lost product and revenue. Using CFD codes that Boeing uses, airflow over the Pringle modeled, design so Pringles did not “lift off” 8
9
Part 1a: The Growth of Data
Data Tsunami
“I’m still here…”
But I’m your new baby big brother…
With millions of
processors…
Going Beyond a CommunityTransient & Multi-Messenger
Astronomy
10
New era: seeing events as they occur Here now
ALMA, EVLA in radio
Ice Cube neutrinos On horizon
24-42m optical? LSST = SDSS
(40TB) every night! SKA = exabytes
Simulations integrate all physics
Data-intensive = compute-intensive
Astronomy 1500-2010 was passive. No longer!
Communities need to share data, software,
knowledge, in real time
Will require integration
across disciplines, end-
to-end
Big Data vs The Long Tail of Science
Many “Big Data” projects are “special”Tend to be highly organized, have singular
sources of data, professionally curated, a lot attention paid to them
What about the “Long Tail” (the other 99%)?Thousands of biologists sequencing
communities of organismsThousands of chemist and materials
scientists developing a “materials genome”Millions of people “Tweeting”…Characteristics:
• Heterogeneous, perhaps hand generated• Not curated, reused, served, etc…
11
How do we harness the power of this
long tail?
News Flash! NYT 6/3/13: Drug side
effects discovered by mining web logs:
paroxetine + pravastatin = high
blood sugar!
12
Grand Challenge Communities Combine it All...
Where is it going to go?
12
Same CI useful for black holes, hurricanes
13
Grand Challenge Communities for Complex Problems
Require many disciplines, all scales of collaborations Individuals, groups, teams, communitiesMultiscale Collaborations: Beyond teams
Are dynamic and highly multidisciplinaryTime domain astronomy, emergency forecasting,
metagenomincs, materials genome… Drive sharing technologies and methodologies Researchers collaborate, work by sharing
data. Places requirements on eInfrastructre:Software, networks, collaborative environments,
data, sharing, computing, etcScientific culture, reproducibility, access, university
structures“Publications.” What is a modern publication?
13
Social, behavioral and economic sciences
will be critical in helping us
understand these issues…
Scenarios like this in all fields
14
NEON+GIS
Framing the Challenge:Science and Society Transformed by
Data Modern science
Data- and compute-intensive
Integrative, multiscale 4 centuries of constancy,
4 decades 109-12 change! Multi-disciplinary
Collaborations Individuals (Galileo!) Groups, teams, Grand
Challenge Communities
Big Data + Long Tail Sea of Data
Age of Observation15
We still think like
this…
…But such radical change cannot be
adequately addressed with
(current) incremental approach!
Students take note!
Part 2: Crises, Challenges, Opportunities
Computing
Data
Software End-to-end
Networks
Organizational structures
Education
No, we are
not…
Cyber
Instruments & Facilities
17
Five Crises“CDSE” Community needs to address
Computing Technology Multicore: processor is new transistor Programming model, fault tolerance, etc New models: clouds, grids, GPUs, …
Data, provenance, and visualization How do we create “data scientists”? What is an international data
infrastructure? Software treated as e-Infrastructure
Complex applications on coupled compute-data-networked environments, tools needed
Modern apps: 106+ lines, many groups contribute, take decades
18
Five Crises Organization for Multidisciplinary &
Computational Science “Universities must significantly change organizational
structures: multidisciplinary & collaborative research are needed [for US] to remain competitive in global science”
“Itself a discipline, computational science advances all science…inadequate/outmoded structures within Federal government and the academy do not effectively support this critical multidisciplinary field”
Education The CI environment is running away from us! How do we develop a workforce to work
effectively in this world? How do universities transition?
Scientific Computing and Imaging Institute, University of Utah
Data Crisis: Information Big BangPCAST Digital Data
NSF Experts Study
Wired, Nature
Storage Networking Industry Association (SNIA) 100 Year Archive Requirements Survey Report“there is a pending crisis in archiving… we have to create long-term methods for preserving information, for making it available for analysis in the future.” 80% respondents: >50 yrs; 68% > 100 yrs
Industry
The Shift Towards a “Sea of Data”
Implications Science & society are now data-dominated
Experiment, computation, theoryUS mobile phone traffic exceeded 1 exabyte!
Classes of dataCollections, observations, experiments,
simulationsSoftwarePublications
Totally new methodologiesAlgorithms, mathematics, culture
Data become the medium forMultidisciplinarity, communication, publication,
science, economic development…20
How do we attribute credit for this new
publication form? How are data peer reviewed? What is a publication in the modern data-rich
world? What is a business model for OA?
Fundamental questions become
focused around data: What to curate, how to
remove boundaries? How to incentivize
sharing? IP?
21
Part 2a: Recommendations
Software
ACCI Task Force Reports Final recommendations
presented to the NSF Advisory Committee on Cyberinfrastructure Dec 2010
More than 25 workshops and Birds of a Feather sessions, 1300 people involved
Final reports on-line“Permanent programmatic activities in Computational and Data-Enabled Science & Engineering (CDS&E) should be established within NSF.” Grand Challenges Task Force
“NSF should establish processes to collect community requirements and plan long-term software roadmaps.” Software Task Force“Higher education should adopt criteria for tenure and promotion that reward…the production of digital artifacts of scholarship. Such artifacts include widely used data sets, scholarly services delivered online, and software.” Campus Bridging Task Force
22
Campus Bridging
Data & Viz
Grand ChallengeHPC
Learning
Recommendation of NSF Advisory Committee on Cyberinfrastructure
ACCI"The National Science Foundation should create a
program in Computational and Data-Enabled Science and Engineering (CDS&E), based in and coordinated by the NSF Office of Cyberinfrastructure. The new program should be collaborative with relevant disciplinary programs in other NSF directorates and offices."
23
24
Part 3: Universities attempt to respond
We have to do all this and
revolutionize the state/national
economy?
Skoltech: Example of a 21st Century University in the Making
Skoltech at a Glance• A unique Russian institution in international context
– This decade: a community of 200 faculty, 300 post-docs, 1200 graduate students
• Focused on science, engineering and technology– Addressing problems and issues in IT, Energy, Biomedicine,
Space and Nuclear• Interdisciplinary by design; no departments
– 15 centers organized around complex problems• With strong programs in support of innovation and
entrepreneurship– Creating a culture of innovation in every student,
professor, staff member• Important part of the Skolkovo innovation
ecosystem
Integrated data, compute, instrumentation infrastructure and policy under development for• Interdisciplinary research• Accelerating discovery• Economic development
27
Part 3a: You can help lead this revolution
Kathryn Gray
Modern Research & Education Ecosystem
28
Software
Software
Track 2
Track 2
Track 2
Campus Campus Campus Campus Campus Campus CampusCampus
Data
DataData
Data
XSEDE
Education Crisis: I need all of this to start to solve
my problem!
Blue Waters
The Opportunity (US picture)! Now have emerging national Integrated,
High Performance Research ArchitectureBlue Waters and beyond towards exascale:
high end• Extraordinary science continues lead at cutting
edge• Traditional and novel large data applications• Few places can house, field, or drive such a
facilityXSEDE architecture can connect…
• Campus Bridging: campus to national CI…– Campus Assets: MRI, Instruments, DNA
sequencers…– Facilities: Supercomputers, telescopes,
accelerators, light sources, NEON …» ”More silicon than Steel”
– Networks: end-to-end connectivity» Where are those optical network apps?
29
Much to do to build CDSE on this Background: address the “5
Crises” EducationMany new opportunities and challenges
• CSE already has its struggles• Now data: what is a “data scientist”?• CDSE emerges
Data opportunities for education and citizen science
Faculty development, curriculum developmentNeeded on every campus
Talk to NSF, DOE, EC, your national agenciesRecommendations of ACCI, MPSAC, etcNew programs needed: See NSF CDS&E, CI TraCS,
CAREER, “LWD”, etc You can help make this happen
30
Key Messages Astounding rate of change of the
“Triple Helix” of Research, Education, and InnovationComputing and Data radically change
methodsCulture of collaboration around complex
problems These create many crises and
opportunitiesFrom technology to methodology to
culture…Deep integration required for science
Emergence of Computational and Data-enabled Science and Engineering as a discipline and your role!A key part of the paradigm shift
31
32
& Data