rebooting infrastructure for more productive climate ... › sites › default › files › ben...
TRANSCRIPT
![Page 1: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/1.jpg)
nci.org.au
Ben EvansICAS Workshop 2019
The New Innovation Game:Rebooting Infrastructure for more productive Climate, Weather, Environmental Systems & Geophysics Research
![Page 2: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/2.jpg)
nci.org.au
NCI’s national peak research system• Collaboration with major partners - Aus Bureau of Met, CSIRO, Geoscience Australia and ANU
• As such, major priority is climate, weather, environmental and geoscience research
• NCI lead’s the national merit process: 10+% NCI, 10+% Pawsey, plus other specialized but minor systems
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 3: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/3.jpg)
nci.org.au
• Over 89,000 cores (Intel Xeon Sandy Bridge/Broadwell) in 4519 compute nodes• 330 Terabytes of main memory• Infiniband FDR/EDR interconnect• 7 Petabytes of dedicated scratch high-performance storage (150 GB/sec)• 54 Petabytes of high-performance Lustre collection and project storage
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
Last days of Raijin
![Page 4: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/4.jpg)
nci.org.au
• Completed procurement• Fujitsu is the primary contractor, mostly Cascade Lake, and increased in GPU, Centos 8• https://opus.nci.org.au/display/Help/Gadi%3A+NCI%27s+New+Supercomputer
http://nci.org.au/media/gadi-installation-progress/
Gadi Phase 1 User Testing Access for users to Gadi, Raijin running in parallel
11th November 2019
Raijin Decommission Decommission of Sandy Bridge nodesRemoval of Raijin’s Broadwell, Skylake and GPU nodes
25 November 2019
Gadi Broadwell and Skylake Broadwell, Skylake and GPU nodes installed 28 November 2019
Gadi GPU Phase 2 Install More GPUs 11 December 2019
Gadi full production User access to the full system 6 January 2020
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
Gadi – installation of next supercomputer
![Page 5: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/5.jpg)
nci.org.au
Integrated HPC, internal cloud and storage
NCI persistent storage of over 50 PB high-speed (disk) filesystems, ~30 PB, duplicated on project (self-managed) archival (tape) storage.
10 Gigabit Ethernet
/g/data 56 Gigabit FDR Infiniband Fabrics
Cache 1.0PB, 2x LTO5 Tape Libr. - 7PB RAW
2x TS1150 Tape Libr. - 22PB RAW
Raijin 56 Gigabit FDR Infiniband Fabric
100 Gigabit Ethernet InternetAARNet / ICON
OpenStack Cloud
Raijin Login &Data movers
VMWareAspera + GridFTP
NCI Data Services VDI
MassdataArchival Data / HSM
/g/dataNCI Global Persistent Filesystems + HSM
Raijin FSHPC Filesystems
CephCloud Persistent Stores
Lustre HSM
40/56 GigabitEthernet
g/data122.2 PB
g/data26.75 PB
g/data38.2 PB
short7.6 PB
/home/system
/apps/images
Cache1.0 PB
CephTenjin0.5PB
CephNectar0.5PB
Raijin Compute1.67PF, 59000 Xeon Cores
g/data415 PB
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 6: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/6.jpg)
nci.org.au
• Modelling Extreme & High Impact events• Monitoring the Environment & Ocean• Natural Hazards • Agriculture/food security• Natural Hazard and Risk models: Tsunami, Ash-cloud• 3D/4D Geophysics: Magnetotellurics, AEM, Forward/Inverse Seismic models and analysis• Hydrology, Groundwater, Carbon Sequestration
Examples of priority research at NCI
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 7: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/7.jpg)
nci.org.au
For Climate, Weather, Environment and Geophysics, the recent activities have been• CMIP6 – model and analysis prep at the tail-end of phase X, start of phase X+1• Regional Copernicus Satellite Hub, and Australian region Landsat datasets and products• Increasing geophysics datasets ready for HPC and interoperability• Preparing for next generation needs for data, and data analysis
1. Where/when do we internal cloud infrastructure – managed SaaS2. Data repository for trusted, managed, high-used dataset access3. Data analysis environments – eg., Jupyter, server-side processing
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 8: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/8.jpg)
nci.org.au
Challenge 1: Technology changes within HPC centres
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 9: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/9.jpg)
nci.org.au
NCI
Growth in HPC and data analysis systems
Top500: Changes - Vector->Micro->Accelerators->??
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
Most major compute centres are generally not holding back future compute capacity for legacy software:• Componentry moves on, more attractive to purchase• Models need to adjust• Those that adapt go forward • Legacy code that doesn’t adapt will get squeezed out• This is the same as the previous vector -> micro change.
Eg 1. Oak Ridge - but most of HPC systems are in the same boat – with some exceptions for now
• Summit supercomputer system. Resource is GPUs and can’t access if not GPU-enabled
• Frontier – first Exascale (1.5 Eflops) in 2021. AMD EPYC CPU and AMD Radeon Instinct GPU technology
“As a second-generation AI system, Frontier will provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health. eg 2. NERSC is the same -> (2020) GPU accelerators -> (2024) Exa
-> (2028) post Moore/non-von Neumann
![Page 10: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/10.jpg)
nci.org.au
Breaking the Law
Law Description Status
Moore’s Law Transistors density doubles every 2 years Broken (2012)
Top500 HPL Increase in performance of ~1.85x per year Broken (2013)
Dennard Scaling Transistors get smaller, circuitry gets faster, and their power density stays constant. So power use is proportional to area
Broken (2005)
Koomey’s Law Performance per watt doubles every 1.57 years Broken (2005)
Kryder’s Law Areal density of disks doubles every thirteen months Broken (2002)
Wirth’s Law Existing software is getting slower more rapidly than hardware becomes faster. Existing software not adapting easily to new hardware. New techniques &algorithms are needed.
Parkinson’s Law Analogue of the ideal gas law:Work expands so as to fill time available for its completion.
Always true and unbeatable. So set a goal, then do it to highest quality
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 11: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/11.jpg)
nci.org.au
• DSLs – Domain Specific Languages to flexibly address architecture changes PSyclone being used by UK Met Office. Others using gridtools.
• Use of lower precision arithmetic, and mixed precision, replacing some standard algorithms with elements such as low precision steepest gradient, and high precision final result
• Replacing some algorithms in certain circumstance with known inference models based rather than brute-force for all calculations
• Interprocesssor communication changes with MPI+X, where X is alternative communication protocols for better scalability
• Computing on-the-fly rather than storing results – both for models and for dynamic data products through data services.
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
What are the options?
![Page 12: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/12.jpg)
nci.org.au
Challenge 2: Transparency in methods and outputs understanding scientific tolerances
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 13: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/13.jpg)
nci.org.au
• If someone gave you their digital science output, what is needed to convince you that it was correct?
• If this required a complex system that you couldn’t run yourself, what information do you need to trust it? Do you know what the assumptions and dependencies are?
• How much could and should you rely on results persisting over a longer time?
How Trustworthy is your Digital Science?
![Page 14: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/14.jpg)
nci.org.au
What do we mean?
One set of Definitions: the Association for Computing Machinery (ACM):https://www.acm.org/publications/policies/artifact-review-badging (based on International Vocab of Metrology - BIPM)
Repeatability (Same team, same experimental setup)The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.
Replicability (Different team, same experimental setup) The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.
Reproducibility (Different team, different experimental setup) The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 15: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/15.jpg)
nci.org.au
At large scale, it not feasible/practical to repeat or completely verify the work at the large scale, then What do we need to gain trust?
For large computational digital science, the challenge is to be transparent. But how?• Pretesting with visible unit tests, benchmark/scenario tests and procedures?• (How) do we record and share the level of variability/precision? • Systems maintainers to understand the underlying system state
Does anyone know the “versions” of their big and unique systems? What about months ago?
• Transparency is supported by FAIR principles, but must apply more broadly than “output datasets”
• And we need a good scientific rationale of what is trying to be done and the level of precision required
What can we do to trust large-scale digital science?
“You must not fool yourself — and you are the easiest person to fool”, Richard Feynman.
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 16: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/16.jpg)
nci.org.au
To what extent should we be providing information about uncertainty/sensitivity analysis?• What needs to be put in place to Trust Deep Learning/ Machine Learning and AI?
How do we address next generation of workflow systems (the micro-services)• What is a PID on the chain? • Does a certified Blockchain play a role?
To what extent do we expect Provenance records to be available• Is PROV ready to handle the range of questions and return in a usable form?• Can these be available to provide information about the underlying platforms?
More emerging issues
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 17: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/17.jpg)
nci.org.au
Challenge 3: Data for HPC, analysis & interoperability
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 18: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/18.jpg)
nci.org.au
Major Challenges for current users
• Datasets are too large/cumbersome to move, costly to download, hard to maintain coherency• Number of files are very difficult to wrangle as a single dataset or to interoperate• Data not generally fit for purpose - need to be converted/transformed• Both HPC, data and coding skills required for manipulating the data are barriers to analyzing data• Geospatial users would prefer to use standard tools and known protocols for important reference datasets• Most users find it more convenient to analyse “on the desktop” – most compute and data not in central systems
Number of files growing rapidly:3x every 2 years
For managed Datasets, number of files:2018 - 330M 2020 – 1Btoday
© National ComputationalInfrastructure 2019
![Page 19: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/19.jpg)
nci.org.au
Time Sharing Data Centre/
Cloud Computing
Preparing for the next generation of computing & data
The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019
![Page 20: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/20.jpg)
nci.org.au
Time Sharing Data Centre/
Cloud Computing Fog Computing
Preparing for the next generation of computing & data
The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019
![Page 21: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/21.jpg)
nci.org.au
NIST defines Fog computing:“residing between smart end-devices and traditional cloud or data centers. This paradigm supports vertically-isolated, latency-sensitive applications by providing ubiquitous, scalable, layered, federated, and distributed computing, storage, and network connectivity.”https://csrc.nist.gov/csrc/media/publications/sp/800-191/draft/documents/sp800-191-draft.pdf
Fog = data+compute at edge interacts with Big Data central data center/cloud
In reality• most data (80+%) is at the edge • most data requires authorization
• We need to plan for more intelligent interaction with better, high-speed data access
• Future of data and information doesn’t look like:• Login to this system• “show me your files, put in shopping cart, click for download”
What is Fog Computing?
The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019
![Page 22: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/22.jpg)
nci.org.au
Enabling access to data for the future/(current!) BAU use cases
Access patterns are increasing in complexity, and now part of BAU dependencyo Digital environments and analysis/visualisation portalso Virtual Research Environments that span administrative domains and continentso Programmable workflows, Jupyter/iPython notebooks, mobile apps, interconnected systemso Other machine access types:
• sensor-to-machine• device-to-machine• AI-to-machine
The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019
![Page 23: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/23.jpg)
nci.org.au
Challenge for Data Accesso Performant (real-time) interoperable access to large (PB+), highly curated and interoperable datasetso Can’t afford to store the data in the way needed by special use-cases o Can’t afford to recopy all datao Can’t afford to coordinate to have in one location o More demand for APIs and micro-services (e.g., data streams and data channels) o Data that is needed might not even pre-exist. That is:
• Dynamically generated datasets: not precomputed & stored• Intelligent services
* Server-side algorithms* Deep Learning
The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019
![Page 24: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/24.jpg)
nci.org.au
High Performance for both Big Compute and Big Data Analysis
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 25: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/25.jpg)
nci.org.au
Interoperable
Make longtail data easily analyzable with the large reference data
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 26: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/26.jpg)
nci.org.au
https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf
• Cleaning and organizing data• 12% preparing for analysis • Just 9% devoted to mining data for patterns
A problem we all know so well …
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 27: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/27.jpg)
nci.org.au
Some of our progress at NCI to address
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 28: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/28.jpg)
nci.org.au
So, what is driving the current large data deluge?
Climate & Weather model data
Reanalysis data Satellite Observations
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
Next domain with large growth is potentially genomics
![Page 29: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/29.jpg)
nci.org.au
Various modes for analyzing data
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 30: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/30.jpg)
nci.org.au
High PerformanceEnsure that the data is organized to allow effective access for High Performance applications
FAIRFindable, Accessible, Interoperable, Re-usable
Data IntegrityManaged services to capture a workflow’s process as a comparable, traceable output.Repeatable science which can be conducted with less effort or an acceleration of outputs.
TransdisciplinaryTo publish, catalogue and access self-documented data and software for enhancing trans-disciplinary, big data science within interoperable data services and protocols.
Some focus area for these Datasets
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 31: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/31.jpg)
nci.org.au
Services-based approach with programmatic access
Geophysics ElevationBathymetryHimawariMODISLandsat
Virtual Geophysics Laboratory
Marine Science Cloud
Eco Science CloudNational Map Climate and
Weather Sci. LAB
OGC Web Feature Service
OGC Web Map
Service
OGC Web Coverage Service
OPeNDAPOGC Web Processing
Service
NCI NERDIP DATA SERVICES
10 PB NCI NERDIP EARTH SYSTEMS, ENVIROMENTAL AND SOLID EARTH DATA COLLECTIONS
Serv
ices
Tech
nolo
gy
Numerical Weather
PredictionCMIP 5
eReefs
GeoServer GSKYTHREDDS
Open Data
AccessUsers
GeoNetwork Catalogue
NCI IndexDatabase
CS/WOpenSearch
EarthServer
HazardsModels GPS
![Page 32: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/32.jpg)
nci.org.au
“Mischief Managed” – Ongoing improvements to managing data
• Managing the ongoing agreement with data providers and usage for others analysis/use• update frequency, licensing / access, provenance• Multi-domain use-cases, improving for data analysis, tracking success, training examples
• Using technology-assisted process to improve ability to manage data at scale• automated conformance checks, data quality and SLA• information exchanges of data / informatics• automated exchanges in DMP - tracking growth, usage, hot-spots• impact measures, citation
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 33: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/33.jpg)
nci.org.au
GSKY: A high performance, scalable, OGC Data-server
Geospatial request
INPU
T
OU
TPU
T
Response
User’s client
FEATURES• OGC Standards (WMS,WCS,WPS)• Scalable• Concurrent• Multi-Cloud
GSio
GSKY Userguide: https://gsky.readthedocs.io/en/latest/GSKY service: http://gsky.nci.org.au/Open Source Project: https://github.com/nci/gsky
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 34: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/34.jpg)
nci.org.au
GSKY responds to Open Geospatial Consortium (OGC) API over http protocol:• Web Map Service (for displaying the images on the map server)• Web Coverage Service (for delivering the actual data as “coverages” - independent of the underlying storage
format or files)• Web Processing service
GSKY allows:• Performant aggregations, subsetting, subsampling, polygon/pencil/pixel drills• Execution of on-the-fly data transformations, re-projections and other algorithms
GSKY is implemented using• Rich metadata server for data query e.g., spatial, temporal, other physical variables• Clustered backend workers – high performance I/O and scale-out server-side compute
GSKY in a nutshell: Geospatial Big Data Service
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 35: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/35.jpg)
nci.org.au
OGCRequest
OGCResponse
GSKY + MAS + Data + ...
GSKY High Level Architecture
• Based on “flow-based programming” and “stream processing”.• Data transformed by connected processes, forming a Directed Acyclic Graph (DAG).
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 36: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/36.jpg)
nci.org.au
GSKY High Level Architecture
Worker 1
Worker 2
Worker n
OWS
Data storeGeospatial
Index
GeoCrawl
...
MAS
GSKY
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 37: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/37.jpg)
nci.org.au
How do we find all that hidden data in millions of files? 2018: 300+M. 2020: 1B
ls –lr xxx-rw-rw-r-- 1 ljm548 er8 11193548 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29180280 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B07-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 20540305 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B08-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22968014 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B09-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25414956 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B10-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30401489 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B11-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 27393561 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B12-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30147486 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B13-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30334542 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B14-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30133906 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B15-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29398087 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B16-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc
./01/02/2220:total 1522512-rw-rw-r-- 1 ljm548 er8 87016573 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B01-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23483702 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B01-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 87466936 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B02-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23517336 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B02-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 89658176 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23859525 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 334819710 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_500-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 93638229 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B04-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 24969643 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B04-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22318069 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B05-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 21506635 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 87700416 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 55131422 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 184768211 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_500-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23929641 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B01-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 24104844 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B02-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25203398 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B03-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 26471627 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B04-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 11839223 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B05-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 11440462 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29170917 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B07-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 20535000 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B08-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22963746 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B09-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25410204 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B10-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc
data stores with poor latency -> geospatial-enabled databases
MAS: Metadata attribute serviceIntegrated with GSKY and now other tools and servicesProvides database backend abstraction over the data
• It identifies individual data objects (datasets, variables, spatial and temporal extents).
• Can be sharded by different geospatial collections or by splitting collections into non-overlapping geographical extents.
As a good database, is able to process complex queries in milliseconds Geospatially, temporally and variable-aware over X millions of filesPOSIX is no longer in the right game, league, ...
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 38: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/38.jpg)
nci.org.au
Ongoing improvements to seamlessly transforming more data on the fly
GSKY-MAS file handling of common large Earth Observation datasetsAmount of data is scaling up with more modern satellite instrumentsGSKY flexible to use other data formats (e.g., CoG and zarr)
MODISMODIS data from 2001 – present, with 4 timestamps per month and 1k files per timestamp.GSKY is handling approx. 900k MODIS files as a coherent dataset.
Landsat-8 datasetLandsat 8 dataset has data from 2013 – present, approximately 3k files per timestamp. GSKY is handling approx. 7 million files.
Sentinel 2 ARDApprox 1400 available days (2015-present) and about 12k files per day for this dataset. GSKY is handling approx. 17 million files.
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 39: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/39.jpg)
nci.org.au
Managing Sparse very large satellite data via GSKY services
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 40: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/40.jpg)
nci.org.au
Realtime playback with server-side data processing
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 41: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/41.jpg)
nci.org.au
Comparing Observation and models
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 42: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/42.jpg)
nci.org.au
First demonstrations of CMIP6 data comparisons via GSKY
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 43: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/43.jpg)
nci.org.au
GSKY providing server-side analysis capability
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 44: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/44.jpg)
nci.org.au
Addressing known performance limitations in well-known services: THREDDS
ASTER geophysics dataProblem handing large variables- 62 GBytes
Using:• default TDS -> times out• Tuned TDS -> 90s • GSKY -> 1s
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 45: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/45.jpg)
nci.org.au
GSKY PoC: Preparing to applying Deep Learning inference models
Case study: Landsat 8 / OSM road detection
1. Change detection / Feature extraction / Classification
2. Derive observed precipitation Case study: Geopotential height for three different levels of the atmosphere from ERA-Interim climate model using Python Tensor objects that train neural networks
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 46: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/46.jpg)
nci.org.au
Challenge 4: Tracking (research) Impact and use
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 47: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/47.jpg)
nci.org.au© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 48: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/48.jpg)
nci.org.au
Interoperability with following services
● ORCID● DataCite● CrossRef Event
Data● Scholix (Publishers)● NCI
● GRID (Digital Science)● SpringerNature● GESIS ● Research Data Australia● ARC, NHMRC, NIH ● Dryad,● CERN,● figshare
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]
![Page 49: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access](https://reader033.vdocument.in/reader033/viewer/2022060321/5f0d432f7e708231d439796e/html5/thumbnails/49.jpg)
nci.org.au
Summary: innovating to meet the challenges ahead
HPC innovation challenges:• Technology Changes in HPC centres• Transparency in methods and outputs, understanding scientific tolerances• Data for HPC, analysis and interoperability• Better tracking (research) impact
© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]