the nci high performance computing (hpc) and high...

26
nci.org.au @NCInews The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections Ben Evans 1 , Lesley Wyborn 1 , Tim Pugh 2 , Chris Allen 1 , Joseph Antony 1 , Kashif Gohar 1 , David Porter 1 , Jon Smillie 1 , Claire Trenham 1 , Jingbo Wang 1 , Irina Bastrakova 3 , Alex Ip 3 , Gavin Bell 4 1 ANU, 2 Bureau of Meteorology, 3 Geoscience Australia, 4 The 6 th Column Project (Second part of this talk is in next ESSI Session) ESSI 2015-8273

Upload: vancong

Post on 15-Feb-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

nci.org.aunci.org.au

@NCInews

The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data CollectionsBen Evans1, Lesley Wyborn1, Tim Pugh2, Chris Allen1, Joseph Antony1, Kashif Gohar1, David Porter1, Jon Smillie1, Claire Trenham1, JingboWang1, Irina Bastrakova3, Alex Ip3, Gavin Bell4

1ANU, 2Bureau of Meteorology,3Geoscience Australia, 4The 6th Column Project(Second part of this talk is in next ESSI Session)

ESSI 2015-8273

nci.org.au

• High Performance Data (HPD) - data that is carefully prepared, standardised and structured so that it can be used in Data-Intensive Science on HPC (Evans, ISESS 2015, Springer)– HPC – turning compute into IO-bound problems

– HPD – turning IO-bound into ontology + semantic problems

• What are the HPC and HPD drivers?

• How do you build environments for this infrastructure that is easy for users to do science?

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

1/25

nci.org.au

Top 500 Super Computer list since 1990

• Fast-and-flexible data access to structured data is required

• The needs to be a balance between processing power and ability to access data (data scaling)

• The focus is for on-demand direct access to large data sources

• enabling High performance analytics and analysis tools directly on that contenthttp://www.top500.org/statistics/perfdevel/

Current NCI

Next NCI

2/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Elephant Flows Place Great Demands on Networks

Physical pipe that leaks water at rate of .0046% by volume.

Network ‘pipe’ that drops packets at rate of .0046%.

Result100% of data transferred, slowly, at <<5% optimal speed.

Result 99.9954% of water transferred.

essentially fixed

determined by speed of light

With proper engineering, we can minimize packet loss.

Assumptions: 10Gbps TCP flow, 80ms RTT. See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design

Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.

© National ComputationalInfrastructure 2015

3/25

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Raijin:• 57,472 cores (Intel Xeon Sandy Bridge technology,

2.6 GHz) in 3592 compute nodes;• 160 TBytes (approx.) of main memory;• Infiniband FDR interconnect; and• 7 PBytes (approx.) of usable fast filesystem (for

short-term scratch space).• 1.5 MW power; 100 tonnes of water in cooling

Partner Cloud• Same generation of technology as raijin (Intel

Xeon Sandy Bridge technology, 2.6 GHz) but only 1500 cores;

• Infiniband FDR interconnect;• Collaborative platform for services and• The platform for hosting non-batch services

NCI Nectar Cloud• Same generation as partner cloud• Non-managed environment• Weak integration

Computational and Cloud Platforms

© National ComputationalInfrastructure 2015

4/25

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Per-Tenant public IP assignments (CIDR boundaries – typically /29)

FDR

IB

FDR

IB

FDR

IBFD

R IB

FDR

IB

FDR

IB

OpenStack private IP (flat network*) - quota managed

NFS

Lustre

NFS

SSDSSDSSDSSDSSDSSD

NCI Cloud

© National ComputationalInfrastructure 2015

5/25

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

NCI’s integrated high-performance environment

10 GigE

/g/data 56Gb FDR IB Fabric

/g/data17.4 PB

/g/data26.75 PB

/short7.6PB

/home, /system, /images,

/apps

Cache 1.0PB, Tape 20PB

Massdata (tape) Persistent global parallel filesystem

Raijin high-speed filesystem

Raijin HPC Compute

Raijin Login + Data movers

NCI data movers

To s

econ

d da

ta c

entr

e

Raijin 56Gb FDR IB Fabric

Internet

6/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

/g/data39 PB

Cloud

nci.org.au

10+ PB of Data for Interdisciplinary Science

BOM GA CSIRO ANU Inter-national

Other National

CMIP53PB

Astronomy (Optical)

200 TB

WaterOcean1.5 PB

Atmosphere2.4 PBEarth

Observ. 2 PB

MarineVideos 10 TB

Geophysics 300 TB

Weather340 TB

Bathy, DEM100 TB

7/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Data Collections Approx. Capacity

CMIP5, CORDEX ~3 Pbytes

ACCESS products 2.4 Pbytes

LANDSAT, MODIS, VIIRS, AVHRR, INSAR, MERIS 1.5 Pbytes

Digital Elevation, Bathymetry, Onshore Geophysics 700 Tbytes

Seasonal Climate 700 Tbytes

Bureau of Meteorology Observations 350 Tbytes

Bureau of Meteorology Ocean-Marine 350 Tbytes

Terrestrial Ecosystem 290 Tbytes

Reanalysis products 100 Tbytes

National Environment Research Data Collections (NERDC)1. Climate/ESS Model Assets and Data Products2. Earth and Marine Observations and Data Products3. Geoscience Collections4. Terrestrial Ecosystems Collections5. Water Management and Hydrology Collections

8/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.aunci.org.au

Internationally sourced • Satellite Data (USGS, NASA, JAXA, ESA, …)• Reanalysis (ECMWF, NCEP, NCAR, …)• Climate Data (CMIP5, AMIP, GeoMIP, CORDEX, …)• Ocean Modelling (Earth Simulator, NOAA, GFDL, …)These will only increase as we depend on more data, and some will be replicated.

How can we better keep this in sync, versioned, and back-referenced for the supplier?

• Organise “long-tail” data that calibrates and integrates with the big data.How should we manage this data, versioned, and easily attribute

supplier (researcher? Collab? Uni? Agency?)

9/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

• Data Formats• Standardize data formats – time to convert legacy and proprietary ones• Appropriately normalise the data data models and conventions• Adopt HPC-enabled libraries that abstracts storage

• Expose all attributes for search • not just collection-level search, not just datasets, all data attributes• What are the handles we need to access the data?

• Provide more programmatic interfaces and link up data and compute resources• More server side processing

• Add the semantic meaning to the data• Create useful datasets (in the programming context) from data collections• Is it scientifically appropriate for a data service to aggregate/interpolate?

• What unique/persistent identifiers do we need? • DOI is only part of the story.• Versioning is important.• Born linked data and maintaining graph infrastructure

Some Data Challenges10/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Regularising High Performance Data using HDF5

HDF5 MPI-enabled HDF5 Serial

NetCDF-4 Library libgdal

Library Layer 1

Library Layer 2

[SEG-Y]Airborne

Geophysics Line data

[FITS] BAG …

Metadata

Layer

netCDF-CF HDF-EOS5 ISO 19115, RIF-CS, DCAT etc

Com

pilers &

Tools

Globe Caritas

Python, R, MatLab, IDL

Open NavSurface

Fortran, C, C++

Ferret, CDO,NCL, NCO, GDL,GDAL,

GrADS,GRASS,QGIS

Lustre Other Storage (options)

© National ComputationalInfrastructure 2015

11/25

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

Presenter
Presentation Notes
The NCO toolkit manipulates and analyzes data stored in netCDF-accessible formats, including DAP, HDF4, and HDF5

nci.org.au

HDF5 MPI-enabled HDF5 Serial

NetCDF-4 Library libgdal

Library Layer 1

Library Layer 2

Airborne Geophysics Line data

BAG …

Metadata

Layer

netCDF-CF HDF-EOS5 ISO 19115, RIF-CS, DCAT etc

Com

pilers &

Tools

Globe Caritas

Python, R, MatLab, IDL

Open NavSurface

Fortran, C, C++

Ferret, CDO,NCL, NCO, GDL,GDAL,

GrADS,GRASS,QGIS

Services(expose data m

odel+sema

ntics)

Fast “whole-of-library” catalogue

OG

C W

FS

OG

C SO

S

OG

C W

PS

OG

C W

CS

OG

C W

MS

OpenD

AP

Lustre Other Storage (options)

© National ComputationalInfrastructure 2015

12/25

[SEG-Y][FITS]

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

Regularising High Performance Data using HDF5 –including Data Services

Presenter
Presentation Notes
The data licenses and T&Cs need to be preserved. Also the authorised access model developed.

nci.org.au

GeoNetwork catalogue

Lucene database

DAP, OGC, … Services

/g/data1

/g/data2

Supercomputer access

Virtual lab

Trialing Elastic Search

Finding data and services13/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Virtual Labs:• Separating Researcher from Software builders• Cloud is an enabler, but:

• don’t make researchers become full system admins.• save developers from being operational

Productivity

Perspiration

Proj1:Start Proj1:End

Project lifecycle – and preparing success

Proj2-4:Start Proj2-4:End

Prototype to Production - anti-”Mine” craft 14/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Development Phase in a project

VL Managers

Dev

elop

ers

Hea

dspa

ce h

ours

VL Managers

Dev

elop

er

Poorly executed

DeveloperReasonablyexecuted

VL Mgr.

Wellexecuted

?

Prototype to Production - anti-”Mine” craft 15/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Prototype to Production - anti-”Mine” craft

Development Phase in a project

VL Managers

Dev

elop

ers

Hea

dspa

ce h

ours

VL Managers

Dev

elop

er

Poorly executed

DeveloperReasonablyexecuted

VL Mgr

Wellexecuted

Changed Scope – adopted broadly

16/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.aunci.org.au

Virtual Laboratory driven software patterns

Basic OS functions

Common Modules

Bespoke Services

Special config choices

Super Software Stack

NCI Stack 1NCI Env Stack

WorkflowX

Analytics Stack

2xStack1

Modify Stack1Modify Stack 2P2P

Vis Stack

Gridftp

Take Stacks from UpstreamAnd use as Bundles

17/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Step 1: Development• Get template for development• What is special, separate out what is common• Reuse other software stacks where possible

Step 2: Prototype• Deploy in an isolated tenant of a cloud• Determine dependencies.• Test cases to demonstrate correctly functioning.

Step 3: Sustainability• Pull repo into operational tenant• Prepare bundle for integration with rest of framework• Hand back cleaned bundle• Establish DevOps process

Transition from developer, to prototype, to DevOps18/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.aunci.org.au

NCI Core Bundles

Community1 repoCommunity2 repo

Virtual LaboratoryOperational Bundle- Git controlled- pull model- continuous integration

testing

DevOps approach to building and operating environments19/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

• Separates roles and responsibilities - from gatekeeper to DevOps management: • Specialist on package• VL managers• system admin

• “Architecture” to “Platform”• flexible with technology change• makes handover/maintenance easier

• Both Test/Dev/Ops and patches/rollback become BAU• Sharable bundles• Can tag release of software stacks• Precondition for trusted software stacks• Provenance - Scientific / gov policy scrutiny

Advantages20/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

Presenter
Presentation Notes
Running the trunk: Pulling in the upstream for production use. "running the trunk" and "stable" can become mutually exclusive! Need software collaboration in the DevOps cycle.

nci.org.aunci.org.au

A snapshot of layered bundles to build complex VLs21/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

Increasing use of iPython Notebooks

VDI - Easy In-situ environment using virtual analysis desktops.

Easy analysis environments22/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.au

VDI – cont …23/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

nci.org.aunci.org.au

NCI Petascale Data-Intensive Science Platform

10PB+ Research Data

Server-side analysis and visualization

Data ServicesTHREDDS

VDI: Cloud scale user desktops on data

Web-time analytics software

24/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans

Presenter
Presentation Notes
This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis.   To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will involve the further integration and analysis of this data across the social sciences to facilitate the impacts across the societal domain, including timely analysis to more accurately predict and forecast future climate and environmental state.

nci.org.au

• Interdisciplinary Science To publish, catalogue and access data and software for enhancing interdisciplinary, big data-intensive (HPD) science and with interoperable data services and protocols.

• Integrity of ScienceManaged services to capture a workflow’s process as a comparable, traceable output.Ease-of-access to data and software for enhanced workflow development and repeatable science which can be conducted with less effort or an acceleration of outputs.

• Integrity of DataThe data repository services to ensure data integrity, provenance records, universal identifiers, repeatable data discovery and access from workflows or interactive users.

Summary: Progress toward Major Milestones25/25

© National ComputationalInfrastructure 2015

“NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform”, EGU2015-8273, 17 April, 2015 @BenJKEvans