rebooting infrastructure for more productive climate ... › sites › default › files › ben...

49
nci.org.au Ben Evans ICAS Workshop 2019 The New Innovation Game: Rebooting Infrastructure for more productive Climate, Weather, Environmental Systems & Geophysics Research

Upload: others

Post on 25-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Ben EvansICAS Workshop 2019

The New Innovation Game:Rebooting Infrastructure for more productive Climate, Weather, Environmental Systems & Geophysics Research

Page 2: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

NCI’s national peak research system• Collaboration with major partners - Aus Bureau of Met, CSIRO, Geoscience Australia and ANU

• As such, major priority is climate, weather, environmental and geoscience research

• NCI lead’s the national merit process: 10+% NCI, 10+% Pawsey, plus other specialized but minor systems

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 3: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

• Over 89,000 cores (Intel Xeon Sandy Bridge/Broadwell) in 4519 compute nodes• 330 Terabytes of main memory• Infiniband FDR/EDR interconnect• 7 Petabytes of dedicated scratch high-performance storage (150 GB/sec)• 54 Petabytes of high-performance Lustre collection and project storage

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Last days of Raijin

Page 4: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

• Completed procurement• Fujitsu is the primary contractor, mostly Cascade Lake, and increased in GPU, Centos 8• https://opus.nci.org.au/display/Help/Gadi%3A+NCI%27s+New+Supercomputer

http://nci.org.au/media/gadi-installation-progress/

Gadi Phase 1 User Testing Access for users to Gadi, Raijin running in parallel

11th November 2019

Raijin Decommission Decommission of Sandy Bridge nodesRemoval of Raijin’s Broadwell, Skylake and GPU nodes

25 November 2019

Gadi Broadwell and Skylake Broadwell, Skylake and GPU nodes installed 28 November 2019

Gadi GPU Phase 2 Install More GPUs 11 December 2019

Gadi full production User access to the full system 6 January 2020

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Gadi – installation of next supercomputer

Page 5: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Integrated HPC, internal cloud and storage

NCI persistent storage of over 50 PB high-speed (disk) filesystems, ~30 PB, duplicated on project (self-managed) archival (tape) storage.

10 Gigabit Ethernet

/g/data 56 Gigabit FDR Infiniband Fabrics

Cache 1.0PB, 2x LTO5 Tape Libr. - 7PB RAW

2x TS1150 Tape Libr. - 22PB RAW

Raijin 56 Gigabit FDR Infiniband Fabric

100 Gigabit Ethernet InternetAARNet / ICON

OpenStack Cloud

Raijin Login &Data movers

VMWareAspera + GridFTP

NCI Data Services VDI

MassdataArchival Data / HSM

/g/dataNCI Global Persistent Filesystems + HSM

Raijin FSHPC Filesystems

CephCloud Persistent Stores

Lustre HSM

40/56 GigabitEthernet

g/data122.2 PB

g/data26.75 PB

g/data38.2 PB

short7.6 PB

/home/system

/apps/images

Cache1.0 PB

CephTenjin0.5PB

CephNectar0.5PB

Raijin Compute1.67PF, 59000 Xeon Cores

g/data415 PB

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 6: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

• Modelling Extreme & High Impact events• Monitoring the Environment & Ocean• Natural Hazards • Agriculture/food security• Natural Hazard and Risk models: Tsunami, Ash-cloud• 3D/4D Geophysics: Magnetotellurics, AEM, Forward/Inverse Seismic models and analysis• Hydrology, Groundwater, Carbon Sequestration

Examples of priority research at NCI

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 7: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

For Climate, Weather, Environment and Geophysics, the recent activities have been• CMIP6 – model and analysis prep at the tail-end of phase X, start of phase X+1• Regional Copernicus Satellite Hub, and Australian region Landsat datasets and products• Increasing geophysics datasets ready for HPC and interoperability• Preparing for next generation needs for data, and data analysis

1. Where/when do we internal cloud infrastructure – managed SaaS2. Data repository for trusted, managed, high-used dataset access3. Data analysis environments – eg., Jupyter, server-side processing

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 8: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Challenge 1: Technology changes within HPC centres

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 9: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

NCI

Growth in HPC and data analysis systems

Top500: Changes - Vector->Micro->Accelerators->??

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Most major compute centres are generally not holding back future compute capacity for legacy software:• Componentry moves on, more attractive to purchase• Models need to adjust• Those that adapt go forward • Legacy code that doesn’t adapt will get squeezed out• This is the same as the previous vector -> micro change.

Eg 1. Oak Ridge - but most of HPC systems are in the same boat – with some exceptions for now

• Summit supercomputer system. Resource is GPUs and can’t access if not GPU-enabled

• Frontier – first Exascale (1.5 Eflops) in 2021. AMD EPYC CPU and AMD Radeon Instinct GPU technology

“As a second-generation AI system, Frontier will provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health. eg 2. NERSC is the same -> (2020) GPU accelerators -> (2024) Exa

-> (2028) post Moore/non-von Neumann

Page 10: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Breaking the Law

Law Description Status

Moore’s Law Transistors density doubles every 2 years Broken (2012)

Top500 HPL Increase in performance of ~1.85x per year Broken (2013)

Dennard Scaling Transistors get smaller, circuitry gets faster, and their power density stays constant. So power use is proportional to area

Broken (2005)

Koomey’s Law Performance per watt doubles every 1.57 years Broken (2005)

Kryder’s Law Areal density of disks doubles every thirteen months Broken (2002)

Wirth’s Law Existing software is getting slower more rapidly than hardware becomes faster. Existing software not adapting easily to new hardware. New techniques &algorithms are needed.

Parkinson’s Law Analogue of the ideal gas law:Work expands so as to fill time available for its completion.

Always true and unbeatable. So set a goal, then do it to highest quality

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 11: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

• DSLs – Domain Specific Languages to flexibly address architecture changes PSyclone being used by UK Met Office. Others using gridtools.

• Use of lower precision arithmetic, and mixed precision, replacing some standard algorithms with elements such as low precision steepest gradient, and high precision final result

• Replacing some algorithms in certain circumstance with known inference models based rather than brute-force for all calculations

• Interprocesssor communication changes with MPI+X, where X is alternative communication protocols for better scalability

• Computing on-the-fly rather than storing results – both for models and for dynamic data products through data services.

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

What are the options?

Page 12: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Challenge 2: Transparency in methods and outputs understanding scientific tolerances

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 13: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

• If someone gave you their digital science output, what is needed to convince you that it was correct?

• If this required a complex system that you couldn’t run yourself, what information do you need to trust it? Do you know what the assumptions and dependencies are?

• How much could and should you rely on results persisting over a longer time?

How Trustworthy is your Digital Science?

Page 14: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

What do we mean?

One set of Definitions: the Association for Computing Machinery (ACM):https://www.acm.org/publications/policies/artifact-review-badging (based on International Vocab of Metrology - BIPM)

Repeatability (Same team, same experimental setup)The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation.

Replicability (Different team, same experimental setup) The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts.

Reproducibility (Different team, different experimental setup) The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently.

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 15: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

At large scale, it not feasible/practical to repeat or completely verify the work at the large scale, then What do we need to gain trust?

For large computational digital science, the challenge is to be transparent. But how?• Pretesting with visible unit tests, benchmark/scenario tests and procedures?• (How) do we record and share the level of variability/precision? • Systems maintainers to understand the underlying system state

Does anyone know the “versions” of their big and unique systems? What about months ago?

• Transparency is supported by FAIR principles, but must apply more broadly than “output datasets”

• And we need a good scientific rationale of what is trying to be done and the level of precision required

What can we do to trust large-scale digital science?

“You must not fool yourself — and you are the easiest person to fool”, Richard Feynman.

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 16: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

To what extent should we be providing information about uncertainty/sensitivity analysis?• What needs to be put in place to Trust Deep Learning/ Machine Learning and AI?

How do we address next generation of workflow systems (the micro-services)• What is a PID on the chain? • Does a certified Blockchain play a role?

To what extent do we expect Provenance records to be available• Is PROV ready to handle the range of questions and return in a usable form?• Can these be available to provide information about the underlying platforms?

More emerging issues

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 17: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Challenge 3: Data for HPC, analysis & interoperability

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 18: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Major Challenges for current users

• Datasets are too large/cumbersome to move, costly to download, hard to maintain coherency• Number of files are very difficult to wrangle as a single dataset or to interoperate• Data not generally fit for purpose - need to be converted/transformed• Both HPC, data and coding skills required for manipulating the data are barriers to analyzing data• Geospatial users would prefer to use standard tools and known protocols for important reference datasets• Most users find it more convenient to analyse “on the desktop” – most compute and data not in central systems

Number of files growing rapidly:3x every 2 years

For managed Datasets, number of files:2018 - 330M 2020 – 1Btoday

© National ComputationalInfrastructure 2019

Page 19: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Time Sharing Data Centre/

Cloud Computing

Preparing for the next generation of computing & data

The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019

Page 20: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Time Sharing Data Centre/

Cloud Computing Fog Computing

Preparing for the next generation of computing & data

The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019

Page 21: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

NIST defines Fog computing:“residing between smart end-devices and traditional cloud or data centers. This paradigm supports vertically-isolated, latency-sensitive applications by providing ubiquitous, scalable, layered, federated, and distributed computing, storage, and network connectivity.”https://csrc.nist.gov/csrc/media/publications/sp/800-191/draft/documents/sp800-191-draft.pdf

Fog = data+compute at edge interacts with Big Data central data center/cloud

In reality• most data (80+%) is at the edge • most data requires authorization

• We need to plan for more intelligent interaction with better, high-speed data access

• Future of data and information doesn’t look like:• Login to this system• “show me your files, put in shopping cart, click for download”

What is Fog Computing?

The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019

Page 22: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Enabling access to data for the future/(current!) BAU use cases

Access patterns are increasing in complexity, and now part of BAU dependencyo Digital environments and analysis/visualisation portalso Virtual Research Environments that span administrative domains and continentso Programmable workflows, Jupyter/iPython notebooks, mobile apps, interconnected systemso Other machine access types:

• sensor-to-machine• device-to-machine• AI-to-machine

The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019

Page 23: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Challenge for Data Accesso Performant (real-time) interoperable access to large (PB+), highly curated and interoperable datasetso Can’t afford to store the data in the way needed by special use-cases o Can’t afford to recopy all datao Can’t afford to coordinate to have in one location o More demand for APIs and micro-services (e.g., data streams and data channels) o Data that is needed might not even pre-exist. That is:

• Dynamically generated datasets: not precomputed & stored• Intelligent services

* Server-side algorithms* Deep Learning

The Innovation Game iCAS, Sept 2019 [email protected]© National ComputationalInfrastructure 2019

Page 24: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

High Performance for both Big Compute and Big Data Analysis

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 25: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Interoperable

Make longtail data easily analyzable with the large reference data

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 26: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

https://visit.figure-eight.com/rs/416-ZBE-142/images/CrowdFlower_DataScienceReport_2016.pdf

• Cleaning and organizing data• 12% preparing for analysis • Just 9% devoted to mining data for patterns

A problem we all know so well …

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 27: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Some of our progress at NCI to address

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 28: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

So, what is driving the current large data deluge?

Climate & Weather model data

Reanalysis data Satellite Observations

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Next domain with large growth is potentially genomics

Page 29: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Various modes for analyzing data

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 30: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

High PerformanceEnsure that the data is organized to allow effective access for High Performance applications

FAIRFindable, Accessible, Interoperable, Re-usable

Data IntegrityManaged services to capture a workflow’s process as a comparable, traceable output.Repeatable science which can be conducted with less effort or an acceleration of outputs.

TransdisciplinaryTo publish, catalogue and access self-documented data and software for enhancing trans-disciplinary, big data science within interoperable data services and protocols.

Some focus area for these Datasets

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 31: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Services-based approach with programmatic access

Geophysics ElevationBathymetryHimawariMODISLandsat

Virtual Geophysics Laboratory

Marine Science Cloud

Eco Science CloudNational Map Climate and

Weather Sci. LAB

OGC Web Feature Service

OGC Web Map

Service

OGC Web Coverage Service

OPeNDAPOGC Web Processing

Service

NCI NERDIP DATA SERVICES

10 PB NCI NERDIP EARTH SYSTEMS, ENVIROMENTAL AND SOLID EARTH DATA COLLECTIONS

Serv

ices

Tech

nolo

gy

Numerical Weather

PredictionCMIP 5

eReefs

GeoServer GSKYTHREDDS

Open Data

AccessUsers

GeoNetwork Catalogue

NCI IndexDatabase

CS/WOpenSearch

EarthServer

HazardsModels GPS

Page 32: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

“Mischief Managed” – Ongoing improvements to managing data

• Managing the ongoing agreement with data providers and usage for others analysis/use• update frequency, licensing / access, provenance• Multi-domain use-cases, improving for data analysis, tracking success, training examples

• Using technology-assisted process to improve ability to manage data at scale• automated conformance checks, data quality and SLA• information exchanges of data / informatics• automated exchanges in DMP - tracking growth, usage, hot-spots• impact measures, citation

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 33: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

GSKY: A high performance, scalable, OGC Data-server

Geospatial request

INPU

T

OU

TPU

T

Response

User’s client

FEATURES• OGC Standards (WMS,WCS,WPS)• Scalable• Concurrent• Multi-Cloud

GSio

GSKY Userguide: https://gsky.readthedocs.io/en/latest/GSKY service: http://gsky.nci.org.au/Open Source Project: https://github.com/nci/gsky

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 34: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

GSKY responds to Open Geospatial Consortium (OGC) API over http protocol:• Web Map Service (for displaying the images on the map server)• Web Coverage Service (for delivering the actual data as “coverages” - independent of the underlying storage

format or files)• Web Processing service

GSKY allows:• Performant aggregations, subsetting, subsampling, polygon/pencil/pixel drills• Execution of on-the-fly data transformations, re-projections and other algorithms

GSKY is implemented using• Rich metadata server for data query e.g., spatial, temporal, other physical variables• Clustered backend workers – high performance I/O and scale-out server-side compute

GSKY in a nutshell: Geospatial Big Data Service

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 35: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

OGCRequest

OGCResponse

GSKY + MAS + Data + ...

GSKY High Level Architecture

• Based on “flow-based programming” and “stream processing”.• Data transformed by connected processes, forming a Directed Acyclic Graph (DAG).

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 36: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

GSKY High Level Architecture

Worker 1

Worker 2

Worker n

OWS

Data storeGeospatial

Index

GeoCrawl

...

MAS

GSKY

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 37: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

How do we find all that hidden data in millions of files? 2018: 300+M. 2020: 1B

ls –lr xxx-rw-rw-r-- 1 ljm548 er8 11193548 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29180280 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B07-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 20540305 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B08-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22968014 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B09-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25414956 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B10-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30401489 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B11-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 27393561 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B12-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30147486 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B13-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30334542 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B14-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 30133906 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B15-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29398087 Jan 4 2018 20180102221000-P1S-ABOM_OBS_B16-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc

./01/02/2220:total 1522512-rw-rw-r-- 1 ljm548 er8 87016573 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B01-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23483702 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B01-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 87466936 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B02-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23517336 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B02-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 89658176 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23859525 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 334819710 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B03-PRJ_GEOS141_500-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 93638229 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B04-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 24969643 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B04-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22318069 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B05-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 21506635 Jan 4 2018 20180102222000-P1S-ABOM_BRF_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 87700416 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_1000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 55131422 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 184768211 Jan 4 2018 20180102222000-P1S-ABOM_GEOM_SOLAR-PRJ_GEOS141_500-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 23929641 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B01-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 24104844 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B02-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25203398 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B03-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 26471627 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B04-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 11839223 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B05-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 11440462 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B06-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 29170917 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B07-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 20535000 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B08-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 22963746 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B09-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc-rw-rw-r-- 1 ljm548 er8 25410204 Jan 4 2018 20180102222000-P1S-ABOM_OBS_B10-PRJ_GEOS141_2000-HIMAWARI8-AHI.nc

data stores with poor latency -> geospatial-enabled databases

MAS: Metadata attribute serviceIntegrated with GSKY and now other tools and servicesProvides database backend abstraction over the data

• It identifies individual data objects (datasets, variables, spatial and temporal extents).

• Can be sharded by different geospatial collections or by splitting collections into non-overlapping geographical extents.

As a good database, is able to process complex queries in milliseconds Geospatially, temporally and variable-aware over X millions of filesPOSIX is no longer in the right game, league, ...

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 38: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Ongoing improvements to seamlessly transforming more data on the fly

GSKY-MAS file handling of common large Earth Observation datasetsAmount of data is scaling up with more modern satellite instrumentsGSKY flexible to use other data formats (e.g., CoG and zarr)

MODISMODIS data from 2001 – present, with 4 timestamps per month and 1k files per timestamp.GSKY is handling approx. 900k MODIS files as a coherent dataset.

Landsat-8 datasetLandsat 8 dataset has data from 2013 – present, approximately 3k files per timestamp. GSKY is handling approx. 7 million files.

Sentinel 2 ARDApprox 1400 available days (2015-present) and about 12k files per day for this dataset. GSKY is handling approx. 17 million files.

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 39: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Managing Sparse very large satellite data via GSKY services

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 40: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Realtime playback with server-side data processing

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 41: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Comparing Observation and models

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 42: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

First demonstrations of CMIP6 data comparisons via GSKY

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 43: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

GSKY providing server-side analysis capability

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 44: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Addressing known performance limitations in well-known services: THREDDS

ASTER geophysics dataProblem handing large variables- 62 GBytes

Using:• default TDS -> times out• Tuned TDS -> 90s • GSKY -> 1s

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 45: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

GSKY PoC: Preparing to applying Deep Learning inference models

Case study: Landsat 8 / OSM road detection

1. Change detection / Feature extraction / Classification

2. Derive observed precipitation Case study: Geopotential height for three different levels of the atmosphere from ERA-Interim climate model using Python Tensor objects that train neural networks

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 46: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Challenge 4: Tracking (research) Impact and use

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 47: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 48: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Interoperability with following services

● ORCID● DataCite● CrossRef Event

Data● Scholix (Publishers)● NCI

● GRID (Digital Science)● SpringerNature● GESIS ● Research Data Australia● ARC, NHMRC, NIH ● Dryad,● CERN,● figshare

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]

Page 49: Rebooting Infrastructure for more productive Climate ... › sites › default › files › Ben Evans_0.pdf · • Summit supercomputer system. Resource is GPUs and can’t access

nci.org.au

Summary: innovating to meet the challenges ahead

HPC innovation challenges:• Technology Changes in HPC centres• Transparency in methods and outputs, understanding scientific tolerances• Data for HPC, analysis and interoperability• Better tracking (research) impact

© National ComputationalInfrastructure 2019 The Innovation Game iCAS, Sept 2019 [email protected]