![Page 1: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/1.jpg)
Ian Foster
Acceleratingdata-driven discovery in energy science
Distinguished Fellow
![Page 2: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/2.jpg)
Life Sciences and Biology
Advanced MaterialsCondensed Matter
Physics
Chemistry and Catalysis
Soft Materials
Environmental and Geo Sciences
Can we determine pathways that lead to novel states and
nonequilibrium assemblies?
Can we observe – and control –
nanoscale chemical transformations in
macroscopic systems?
Can we create new materials with extraordinary properties – by engineering
defects at the atomic scale?
Can we map – and ultimately harness –
dynamic heterogeneity in complex correlated
systems?
Can we unravel the secrets of biological function – across length scales?
Can we understand physical and chemical processes in the most extreme environments?
2
New tools are needed to answer the most pressing scientific Qs
![Page 3: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/3.jpg)
The resulting data delugeSpans biology, climate, cosmology, materials, physics, urban sciences, …
Simulation dataPetascale exascale simulations; simulation datasets as laboratories; high-throughput characterization; etc.
Experimental dataLight sources, genome sequencing, next-gen ARM radar, sky surveys, high-throughput experiments, etc.
New research methods that depend on coupling1) Of computation and experiment 2) Across data sources and types - inverse problems, computer control - knowledge integration, analysis
![Page 4: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/4.jpg)
Scientific progress requirescollaborative discovery engines
informaticsanalysis
high-throughputexperiments
problemspecification
modeling and simulation
analysis &visualization
experimentaldesign
analysis &visualization
Integrateddatabases
Rick Stevens
![Page 5: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/5.jpg)
Example: A discovery engine for disordered structures
Diffuse scattering images from Ray Osborn et al., Argonne
SampleExperimentalscattering
Material composition
Simulated structure
Simulatedscattering
La 60%Sr
40%
Detect errors (secs—mins)
Knowledge basePast experiments;
simulations; literature; expert knowledge
Select experiments (mins—hours)
Contribute to knowledge base
Simulations driven by experiments (mins—days)
Knowledge-drivendecision making
Evolutionary optimization
![Page 6: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/6.jpg)
Acceleratingdata-driven discovery
in energy science
(1) Eliminate data friction
![Page 7: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/7.jpg)
Eliminating data friction is essential to modern science
Civilization advancesby extending the number of important operations which we can perform without thinking about them (Whitehead, 1912)
Obstacles to data access, movement, discovery, sharing, and analysis slow research, distort research directions, and waste time (DOE reports, 2005-2015)
![Page 8: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/8.jpg)
Software as a service (SaaS) as lubricant
Customer relationship management (CRM):
A knowledge-intensive processHistorically, handled manually or via expensive, inflexible on-premise software
SaaS has revolutionized how CRM is consumed Outsource to provider who
runs software on cloud Access via simple interfaces Ease of use Cost Flexibility Complexity
Drag picture to placeholder or click icon to add
SaaSOn-premise
![Page 9: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/9.jpg)
Globus: Research data management as a service
Essential research data management services File transfer Data sharing Data publication Identity and groups
Builds on 15 years of DOE research
Outsourced and automated High availability, reliability,
performance, scalability Convenient for
Casual users: Web interfaces Power users: APIs Administrators: Install, manage
globus.org
![Page 10: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/10.jpg)
10
“I need to easily, quickly, & reliably move data to other locations.”
Research Computing HPC Cluster
Lab Server
Campus Home Filesystem
Desktop Workstation
Personal Laptop
DOE supercomputer Public Cloud
![Page 11: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/11.jpg)
11
“I need to get data from a scientific instrument to my analysis system.”
Next GenSequencer
Light Sheet Microscope
MRI Advanced Light Source
![Page 12: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/12.jpg)
12
“I need to easily and securely share my data with my colleagues.”
![Page 13: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/13.jpg)
13
Globus and the research data lifecycle
Researcher initiates transfer request; or requested automatically by script, science gateway
1
InstrumentCompute Facility
Globus transfers files reliably, securely
2
Globus controls access to shared
files on existing storage; no need
to move files to cloud storage!
4
Curator reviews and approves; data set
published on campus or other system
7
Researcher selects files to share, selects user or group,
and sets access permissions
3
Collaborator logs in to Globus and accesses shared files; no local
account required; download via Globus
5
Researcher assembles data set;
describes it using metadata (Dublin core and domain-
specific)
6
6
Peers, collaborators search and discover datasets; transfer and share using Globus
8
Publication Repository
Personal Computer
Transfer
Share
Publish
Discover
• SaaS Only a web browser required
• Use storage system of your choice
• Access using your campus credentials
![Page 14: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/14.jpg)
Globus at a glance
4 major services
13 national labs use Globus
services
100 PBpetabytes transferred
8,000 active endpoints
20 billion files processed
>300 users are active
daily
25,000 registered users
99.95% uptime over the past two years
>30 subscribers
The biggest transfer to date is
1 petabyte
The longest-running transfer to
date took
3 months
We’re eager to learn what
you want to do with Globus services
![Page 15: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/15.jpg)
15
One APS node connects to125 locationsthru mid 2014
![Page 16: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/16.jpg)
Same node(1 Gbps link)
![Page 17: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/17.jpg)
Globus and DOE: Terabytes per month
![Page 18: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/18.jpg)
Globus and DOE: Running total terabytes
![Page 19: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/19.jpg)
Globus and DOE: Active users per month
![Page 20: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/20.jpg)
Response has been gratifying"Really great software." - Benjamin Mayer, Research Associate, Climate Change Science Institute, Oak Ridge National Laboratory
"Whoa! Transfer from NERSC to BNOC (data transfer node) using Globus is screaming!" - Gary Bates, Professional Research Assistant, NOAA
“…Now my users have a fast, easy way to get their data wherever it needs to go, and the setup process was trivial." - Brock Palen, Associate Director, University of Michigan Advanced Research Computing
"... we just had a 153TB transfer that got 20Gb/s and another with 144TB at 25Gb/s! That's pretty insane!" - Jason Alt, Systems Management and Development Lead at National Center for Supercomputing Applications
"We were thrilled by how well Globus worked. We've never seen such high transfer rates, and the service was trivial to install and use." - Dale Land, IT Chief Engineer, Los Alamos National Laboratory
"The system is reliable and secure - and also amazingly easy to use. …It just works." - David Skinner, NERSC user
"I moved 400 GB of files and didn’t even have to think about it." - Jeff Porter, STAR Experiment, Lawrence Berkeley National Lab
"We have been extremely impressed with Globus and how easy it is to use." - Pete Eby, Linux System Administrator, Oak Ridge National Laboratory
"Drag and drop archiving is an incredibly useful feature." - Shreyas Cholia, NERSC user
"The time before Globus now seems like the dark ages!" - Galen Arnold, Systems Engineer, NCSA and Blue Waters PRAC support team, NCSA
![Page 21: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/21.jpg)
21
Globus service APIs serve as a science platform
Identity, Group, andProfile Management
… Globus Toolkit
Glo
bus
API
s
Glo
bus
Con
nectData Publication & Discovery
File Sharing
File Transfer & Replication
![Page 22: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/22.jpg)
Globus platform services enable new application capabilities
![Page 23: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/23.jpg)
Publication as service for ACME
![Page 24: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/24.jpg)
Globus platform accelerates development of new services
![Page 25: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/25.jpg)
Operating a sustainable service
Globus is a not-for-profit service for researchers
We adopt a subscription- supported freemium modelSubscribers get extra features, rapid support
We’re engaged in crossing the chasm
Support from DOE will contribute to long-term success
![Page 26: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/26.jpg)
Acceleratingdata-driven discovery
in energy science
(2) Liberate scientific data
![Page 27: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/27.jpg)
Q: What is the biggest obstacle to data sharing in science?
A: The vast majority of data that is lost, or not online;if online, not described; if described, not indexedNot accessibleNot discoverableNot used
Contrast with common practice for consumer photos (iPhoto) Automated capture Publish then curate Processing to add value Outsourced storage
![Page 28: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/28.jpg)
We must automate the capture, linking, and indexing of all data
Globus publication service encodes and automates data publication pipelines
Example application: Materials Data Facility for materials simulation and experiment data
Proposed distributed virtual collections index, organize, tag, & manage distributed data
Think iPhoto on steroids –backed by domain knowledge and supercomputing power
Drag picture to placeholder or click icon to add
![Page 29: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/29.jpg)
We must automate the capture, linking, and indexing of all data
chiDB: Human-computer collaboration to extract Flory-Huggins ( ) parameters from 𝞆polymers literatureR. Tchoua et al.
Plenario: Spatially and temporally integrated, linked, and searchable database of urban dataC. Catlett, B. Goldstein, T. Malik et al.
Drag picture to placeholder or click icon to addDrag picture to placeholder or click icon to add
![Page 30: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/30.jpg)
30
“I need to publish my data so that others can find it and use it.”
ScholarlyPublication
ReferenceDataset
Research CommunityCollaboration
![Page 31: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/31.jpg)
Publish dashboard
31
![Page 32: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/32.jpg)
Start a new submission
32
![Page 33: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/33.jpg)
33
Describe submission: 1) Dublin Core
![Page 34: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/34.jpg)
34
Describe submission: 2) Science metadata
![Page 35: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/35.jpg)
Assemble the dataset
35
![Page 36: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/36.jpg)
36
Transfer files to submission endpoint
![Page 37: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/37.jpg)
37
Check dataset is assembled correctly
![Page 38: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/38.jpg)
Submission now in curation workflow
38
![Page 39: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/39.jpg)
Search published datasets
39
![Page 40: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/40.jpg)
Search across collections
![Page 41: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/41.jpg)
Discover a published dataset
41
![Page 42: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/42.jpg)
Select a published dataset
42
![Page 43: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/43.jpg)
View downloaded dataset
43
![Page 44: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/44.jpg)
Configuring a publication pipeline: Publication “facets”
URL Handle DOIidentifier
none standard customdescription
domain-specific
none acceptance machine-validatedcuration
human-validated
anonymous Public collaboratorsaccess
embargoed
transient project lifetime “forever”preservation
archive
44
![Page 45: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/45.jpg)
Acceleratingdata-driven discovery
in energy science
(3) Create discovery engines at DOE facilities
![Page 46: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/46.jpg)
Recall: A discovery engine for disordered structures
Diffuse scattering images from Ray Osborn et al., Argonne
SampleExperimentalscattering
Material composition
Simulated structure
Simulatedscattering
La 60%Sr
40%
Detect errors (secs—mins)
Knowledge basePast experiments;
simulations; literature; expert knowledge
Select experiments (mins—hours)
Contribute to knowledge base
Simulations driven by experiments (mins—days)
Knowledge-drivendecision making
Evolutionary optimization
![Page 47: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/47.jpg)
SimulationCharacterize,
PredictAssimilateSteer data acquisition
Data analysisReconstruct,
detect features, auto-correlate,
particle distributions, …
Science automation servicesScripting, security, storage, cataloging, transfer
~0.001-0.5 GB/s/flow~2 GB/s total burst~200 TB/month~10 concurrent flows(Today: x10 in 5 yrs)
IntegrationOptimize, fit, …
Configure CheckGuide
Batch
Immediate
0.001 1 100+PFlops
Precomputematerial
database
Reconstruct image
Auto-correlation
Feature detection
Scientific opportunities Probe material structure and
function at unprecedented scalesTechnical challenges Many experimental modalities Data rates and computation
needs vary widely; increasing Knowledge management,
integration, synthesis
Towards discovery engines for energy science (Argonne LDRD)
![Page 48: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/48.jpg)
Linking experiment and computation
Single-crystal diffuse scattering Defect structure in disordered materials. (Osborn, Wilde, Wozniak, et al.) Estimate structure via inverse modeling: many-simulation evolutionary optimization on 100K+ BG/Q cores (Swift+OpenMP).
Near-field high-energy X-ray diffraction microscopy Microstructure in bulk materials (Almer, Sharma, et al.)Reconstruction on 10K+ BG/Q cores (Swift) takes ~10 minutes,vs. >5 hours on APS cluster or months if data taken home. Used to detect errors in one run that would have resulted in total waste of beamtime.
X-ray nano/microtomographyBio, geo, and material science imaging.(Bicer, Gursoy, Kettimuthu, De Carlo, et al.).Innovative in-slice parallelization method gives reconstruction of 360x2048x1024 dataset in ~1 minute, using 32K BG/Q cores, vs. many days on cluster: enables quasi-instant response
2-BM
1-ID
6-ID
Populate
Sim Sim
Select
Sim
Microstructure of a copper wire, 0.2mm diameter
Advanced Photon Source
Experimental and simulated scattering from manganite
![Page 49: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/49.jpg)
49
1: Run script (EL1.layer)2. Lookup file name=EL1.layeruser=Antontype=reconstruction
Storage locations
3: Transfer inputs
Compute facilities
4: Run app
6: Update catalogs
5: Transfer results
Externalcollaborators
Collaboration catalogs
Provenance
Files & Metadata
Scriptlibraries
0: Develop or reuse script
Researchers
Tying it all together: An energy sciences infrastructure
![Page 50: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/50.jpg)
informaticsanalysis
high-throughputexperiments
problemspecification
modeling and simulation
analysis &visualization
experimentaldesign
analysis &visualization
Integrateddatabases
Summary: Big opportunities and challenges for energy data
Immediate opportunities Reduce data friction and
accelerate discovery by deploying Globus services across all DOE facilities
Develop new services to capture, link energy data
Important research agenda Discovery engines to answer
major scientific questions New research modalities
linking computation and data Organization and analysis of
massive science data
Drag picture to placeholder or click icon to add
![Page 51: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/51.jpg)
51
Thank you to our sponsors!
U.S. DEPARTMENT OF
ENERGY
![Page 52: Accelerating Data-driven Discovery in Energy Science](https://reader036.vdocument.in/reader036/viewer/2022062522/58aa44a61a28ab4c348b5db1/html5/thumbnails/52.jpg)
For more information: [email protected] to co-authors and Globus teamGlobus services (globus.org) Foster, I. Globus Online: Accelerating and democratizing science through
cloud-based services. IEEE Internet Computing(May/June):70-73, 2011. Chard, K., Tuecke, S. and Foster, I. Efficient and Secure Transfer,
Synchronization, and Sharing of Big Data. Cloud Computing, IEEE, 1(3):46-55, 2014.
Chard, K., Foster, I. and Tuecke, S. Globus Platform-as-a-Service for Collaborative Science Applications. Concurrency - Practice and Experience, 27(2):290-305, 2014.
Publication (globus.org/data-publication) Chard, K., Pruyne, J., Blaiszik, B., Ananthakrishnan, R., Tuecke, S. and Foster, I.,
Globus Data Publication as a Service: Lowering Barriers to Reproducible Science. 11th IEEE International Conference on eScience Munich, Germany, 2015
Discovery engines Foster, I., Ananthakrishnan, R., Blaiszik, B., Chard, K., Osborn, R., Tuecke, S., Wilde,
M. and Wozniak, J. Networking materials data: Accelerating discovery at an experimental facility. Big Data and High Performance Computing, 2015.